add anchors

Amogh-Bharadwaj · Amogh-Bharadwaj · commit aa8ab9b163fd · 2025-07-14T16:38:42.000+05:30
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/controlling_sync.md b/docs/integrations/data-ingestion/clickpipes/mysql/controlling_sync.md
@@ -12,28 +12,28 @@ import cdc_syncs from '@site/static/images/integrations/data-ingestion/clickpipe
 
 This document describes how to control the sync of a database ClickPipe (Postgres, MySQL etc.) when the ClickPipe is in **CDC (Running) mode**.
 
-## Overview
+## Overview {#overview-mysql-sync}
 
 Database ClickPipes have an architecture that consists of two parallel processes - pulling from the source database and pushing to the target database. The pulling process is controlled by a sync configuration that defines how often the data should be pulled and how much data should be pulled at a time. By "at a time", we mean one batch - since the ClickPipe pulls and pushes data in batches.
 
 There are two main ways to control the sync of a database ClickPipe. The ClickPipe will start pushing when one of the below settings kicks in.
 
-### Sync interval
+### Sync interval {#interval-mysql-sync}
 The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
 
 The default is **1 minute**.
 Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
 
-### Pull batch size
+### Pull batch size {#batch-size-mysql-sync}
 The pull batch size is the number of records that the ClickPipe will pull from the source database in one batch. Records mean inserts, updates and deletes done on the tables that are part of the pipe.
 
 The default is **100,000** records.
 A safe maximum is 10 million.
 
-### An exception: Long-running transactions on source
+### An exception: Long-running transactions on source {#transactions-pg-sync}
 When a transaction is run on the source database, the ClickPipe waits until it receives the COMMIT of the transaction before it moves forward. This with **overrides** both the sync interval and the pull batch size.
 
-### Configuring sync settings
+### Configuring sync settings {#configuring-mysql-sync}
 You can set the sync interval and pull batch size when you create a ClickPipe or edit an existing one.
 When creating a ClickPipe it will be seen in the second step of the creation wizard, as shown below:
 <img src={create_sync_settings} alt="Create sync settings" />
@@ -44,7 +44,7 @@ When editing an existing ClickPipe, you can head over to the **Settings** tab of
 This will open a flyout with the sync settings, where you can change the sync interval and pull batch size:
 <img src={edit_sync_settings} alt="Edit sync settings" />
 
-### Monitoring sync control behaviour
+### Monitoring sync control behaviour {#monitoring-mysql-sync}
 You can see how long each batch takes in the **CDC Syncs** table in the **Metrics** tab of the ClickPipe. Note that the duration here includes push time and also if there are no rows incoming, the ClickPipe waits and the wait time is also included in the duration.
 
 <img src={cdc_syncs} alt="CDC Syncs table" />
diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md b/docs/integrations/data-ingestion/clickpipes/mysql/parallel_initial_load.md
@@ -10,16 +10,12 @@ import partition_key from '@site/static/images/integrations/data-ingestion/click
 
 This document explains parallelized snapshot/initial load in the MySQL ClickPipe works and talks about the snapshot parameters that can be used to control it.
 
-:::info This feature is currently behind a feature flag
-Please reach out to us via a support ticket to enable this feature for your ClickHouse organization.
-:::
-
-## Overview
+## Overview {#overview-mysql-snapshot}
 
 Initial load is the first phase of a CDC ClickPipe, where the ClickPipe syncs the historical data of the tables in the source database over to ClickHouse, before then starting CDC. A lot of the times, developers do this in a single-threaded manner.
 However, the MySQL ClickPipe can parallelize this process, which can significantly speed up the initial load.
 
-### Partition key column
+### Partition key column {#key-mysql-snapshot}
 
 Once we've enabled the feature flag, you should see the below setting in the ClickPipe table picker (both during creation and editing of a ClickPipe):
 <img src={partition_key} alt="Partition key column" />
@@ -30,24 +26,24 @@ The MySQL ClickPipe uses a column on your source table to logically partition th
 The partition key column must be indexed in the source table to see a good performance boost. This can be seen by running `SHOW INDEX FROM <table_name>` in MySQL.
 :::
 
-### Logical partitioning
+### Logical partitioning {#logical-partitioning-mysql-snapshot}
 
 Let's talk about the below settings:
 
 <img src={snapshot_params} alt="Snapshot parameters" />
 
-#### Snapshot number of rows per partition
+#### Snapshot number of rows per partition {#numrows-mysql-snapshot}
 This setting controls how many rows constitute a partition. The ClickPipe will read the source table in chunks of this size, and chunks will be processed in parallel based on the initial load parallelism set. The default value is 100,000 rows per partition.
 
-#### Initial load parallelism
+#### Initial load parallelism {#parallelism-mysql-snapshot}
 This setting controls how many partitions will be processed in parallel. The default value is 4, which means that the ClickPipe will read 4 partitions of the source table in parallel. This can be increased to speed up the initial load, but it is recommended to keep it to a reasonable value depending on your source instance specs to avoid overwhelming the source database. The ClickPipe will automatically adjust the number of partitions based on the size of the source table and the number of rows per partition.
 
-#### Snapshot number of tables in parallel
+#### Snapshot number of tables in parallel {#tables-parallel-mysql-snapshot}
 Not really related to parallel snapshot, but this setting controls how many tables will be processed in parallel during the initial load. The default value is 1. Note that is on top of the parallelism of the partitions, so if you have 4 partitions and 2 tables, the ClickPipe will read 8 partitions in parallel.
 
-### Monitoring parallel snapshot in MySQL
+### Monitoring parallel snapshot in MySQL {#monitoring-parallel-mysql-snapshot}
 You can run **SHOW processlist** in MySQL to see the parallel snapshot in action. The ClickPipe will create multiple connections to the source database, each reading a different partition of the source table. If you see **SELECT** queries with different ranges, it means that the ClickPipe is reading the source tables. You can also see the COUNT(*) and the partitioning query in here.
 
-### Limitations
+### Limitations {#limitations-parallel-mysql-snapshot}
 - The snapshot parameters cannot be edited after pipe creation. If you want to change them, you will have to create a new ClickPipe.
 - When adding tables to an existing ClickPipe, you cannot change the snapshot parameters. The ClickPipe will use the existing parameters for the new tables.
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md b/docs/integrations/data-ingestion/clickpipes/postgres/controlling_sync.md
@@ -12,27 +12,28 @@ import cdc_syncs from '@site/static/images/integrations/data-ingestion/clickpipe
 
 This document describes how to control the sync of a database ClickPipe (Postgres, MySQL etc.) when the ClickPipe is in **CDC (Running) mode**.
 
-## Overview
+## Overview {#overview-pg-sync}
 
 Database ClickPipes have an architecture that consists of two parallel processes - pulling from the source database and pushing to the target database. The pulling process is controlled by a sync configuration that defines how often the data should be pulled and how much data should be pulled at a time. By "at a time", we mean one batch - since the ClickPipe pulls and pushes data in batches.
 
 There are two main ways to control the sync of a database ClickPipe. The ClickPipe will start pushing when one of the below settings kicks in.
 
-### Sync interval
+### Sync interval {#interval-pg-sync}
 The sync interval of the pipe is the amount of time (in seconds) for which the ClickPipe will pull records from the source database. The time to push what we have to ClickHouse is not included in this interval.
 
 The default is **1 minute**.
 Sync interval can be set to any positive integer value, but it is recommended to keep it above 10 seconds.
 
-### Pull batch size
+### Pull batch size {#batch-size-pg-sync}
 The pull batch size is the number of records that the ClickPipe will pull from the source database in one batch. Records mean inserts, updates and deletes done on the tables that are part of the pipe.
 
 The default is **100,000** records.
+A safe maximum is 10 million.
 
-### An exception: Long-running transactions on source
+### An exception: Long-running transactions on source {#transactions-pg-sync}
 When a transaction is run on the source database, the ClickPipe waits until it receives the COMMIT of the transaction before it moves forward. This with **overrides** both the sync interval and the pull batch size.
 
-### Configuring sync settings
+### Configuring sync settings {#configuring-pg-sync}
 You can set the sync interval and pull batch size when you create a ClickPipe or edit an existing one.
 When creating a ClickPipe it will be seen in the second step of the creation wizard, as shown below:
 <img src={create_sync_settings} alt="Create sync settings" />
@@ -43,12 +44,12 @@ When editing an existing ClickPipe, you can head over to the **Settings** tab of
 This will open a flyout with the sync settings, where you can change the sync interval and pull batch size:
 <img src={edit_sync_settings} alt="Edit sync settings" />
 
-### Tweaking the sync settings to help with replication slot growth
+### Tweaking the sync settings to help with replication slot growth {#tweaking-pg-sync}
 Let's talk about how to use these settings to handle a large replication slot of a CDC pipe.
 The pushing time to ClickHouse does not scale linearly with the pulling time from the source database. This can be leveraged to reduce the size of a large replication slot.
 By increasing both the sync interval and pull batch size, the ClickPipe will pull a whole lot of data from the source database in one go, and then push it to ClickHouse.
 
-### Monitoring sync control behaviour
+### Monitoring sync control behaviour {#monitoring-pg-sync}
 You can see how long each batch takes in the **CDC Syncs** table in the **Metrics** tab of the ClickPipe. Note that the duration here includes push time and also if there are no rows incoming, the ClickPipe waits and the wait time is also included in the duration.
 
 <img src={cdc_syncs} alt="CDC Syncs table" />
diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md b/docs/integrations/data-ingestion/clickpipes/postgres/parallel_initial_load.md
@@ -9,33 +9,33 @@ import snapshot_params from '@site/static/images/integrations/data-ingestion/cli
 
 This document explains parallelized snapshot/initial load in the Postgres ClickPipe works and talks about the snapshot parameters that can be used to control it.
 
-## Overview
+## Overview {#overview-pg-snapshot}
 
 Initial load is the first phase of a CDC ClickPipe, where the ClickPipe syncs the historical data of the tables in the source database over to ClickHouse, before then starting CDC. A lot of the times, developers do this in a single-threaded manner - such as using pg_dump or pg_restore, or using a single thread to read from the source database and write to ClickHouse.
 However, the Postgres ClickPipe can parallelize this process, which can significantly speed up the initial load.
 
-### CTID column in Postgres
+### CTID column in Postgres {#ctid-pg-snapshot}
 In Postgres, every row in a table has a unique identifier called the CTID. This is a system column that is not visible to users by default, but it can be used to uniquely identify rows in a table. The CTID is a combination of the block number and the offset within the block, which allows for efficient access to rows.
 
-### Logical partitioning
+### Logical partitioning {#logical-partitioning-pg-snapshot}
 The Postgres ClickPipe uses the CTID column to logically partition source tables. It obtains the partitions by first performing a COUNT(*) on the source table, followed by a window function partitioning query to get the CTID ranges for each partition. This allows the ClickPipe to read the source table in parallel, with each partition being processed by a separate thread.
 
 Let's talk about the below settings:
 
 <img src={snapshot_params} alt="Snapshot parameters" />
 
-#### Snapshot number of rows per partition
+#### Snapshot number of rows per partition {#numrows-pg-snapshot}
 This setting controls how many rows constitute a partition. The ClickPipe will read the source table in chunks of this size, and  chunks will be processed in parallel based on the initial load parallelism set. The default value is 100,000 rows per partition.
 
-#### Initial load parallelism
+#### Initial load parallelism {#parallelism-pg-snapshot}
 This setting controls how many partitions will be processed in parallel. The default value is 4, which means that the ClickPipe will read 4 partitions of the source table in parallel. This can be increased to speed up the initial load, but it is recommended to keep it to a reasonable value depending on your source instance specs to avoid overwhelming the source database. The ClickPipe will automatically adjust the number of partitions based on the size of the source table and the number of rows per partition.
 
-#### Snapshot number of tables in parallel
+#### Snapshot number of tables in parallel {#tables-parallel-pg-snapshot}
 Not really related to parallel snapshot, but this setting controls how many tables will be processed in parallel during the initial load. The default value is 1. Note that is on top of the parallelism of the partitions, so if you have 4 partitions and 2 tables, the ClickPipe will read 8 partitions in parallel.
 
-### Monitoring parallel snapshot in Postgres
+### Monitoring parallel snapshot in Postgres {#monitoring-parallel-pg-snapshot}
 You can analyze **pg_stat_activity** to see the parallel snapshot in action. The ClickPipe will create multiple connections to the source database, each reading a different partition of the source table. If you see **FETCH** queries with different CTID ranges, it means that the ClickPipe is reading the source tables. You can also see the COUNT(*) and the partitioning query in here.
 
-### Limitations
+### Limitations {#limitations-parallel-pg-snapshot}
 - The snapshot parameters cannot be edited after pipe creation. If you want to change them, you will have to create a new ClickPipe.
 - When adding tables to an existing ClickPipe, you cannot change the snapshot parameters. The ClickPipe will use the existing parameters for the new tables.