You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,27 +12,27 @@ import cdc_syncs from '@site/static/images/integrations/data-ingestion/clickpipe
12
12
13
13
This document describes how to control the sync of a database ClickPipe (Postgres, MySQL etc.) when the ClickPipe is in **CDC (Running) mode**.
14
14
15
-
## Overview
15
+
## Overview {#overview}
16
16
17
17
Database ClickPipes have an architecture that consists of two parallel processes - pulling from the source database and pushing to the target database. The pulling process is controlled by a sync configuration that defines how often the data should be pulled and how much data should be pulled at a time. By "at a time", we mean one batch - since the ClickPipe pulls and pushes data in batches.
18
18
19
19
There are two main ways to control the sync of a database ClickPipe. The ClickPipe will start pushing when one of the below settings kicks in.
20
20
21
-
### Sync interval
21
+
### Sync interval {#sync-interval}
22
22
The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
23
23
24
24
The default is **1 minute**.
25
25
Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
26
26
27
-
### Pull batch size
27
+
### Pull batch size {#pull-batch-size}
28
28
The pull batch size is the number of records that the ClickPipe will pull from the source database in one batch. Records mean inserts, updates and deletes done on the tables that are part of the pipe.
29
29
30
30
The default is **100,000** records.
31
31
32
-
### An exception: Long-running transactions on source
32
+
### An exception: Long-running transactions on source {#exception-long-running-transactions-on-source}
33
33
When a transaction is run on the source database, the ClickPipe waits until it receives the COMMIT of the transaction before it moves forward. This with **overrides** both the sync interval and the pull batch size.
### Tweaking the sync settings to help with replication slot growth
46
+
### Tweaking the sync settings to help with replication slot growth {#tweaking-the-sync-settings-replication-slot-growth}
47
47
Let's talk about how to use these settings to handle a large replication slot of a CDC pipe.
48
48
The pushing time to ClickHouse does not scale linearly with the pulling time from the source database. This can be leveraged to reduce the size of a large replication slot.
49
49
By increasing both the sync interval and pull batch size, the ClickPipe will pull a whole lot of data from the source database in one go, and then push it to ClickHouse.
50
50
51
-
### Monitoring sync control behaviour
51
+
### Monitoring sync control behavior {#monitoring-sync-control-behavior}
52
52
You can see how long each batch takes in the **CDC Syncs** table in the **Metrics** tab of the ClickPipe. Note that the duration here includes push time and also if there are no rows incoming, the ClickPipe waits and the wait time is also included in the duration.
0 commit comments