Skip to content

Commit 14425a1

Browse files
authored
Merge pull request #4101 from morsapaes/docs-cdc
clickpipes: improve MySQL CDC documentation
2 parents ad45241 + c725890 commit 14425a1

File tree

8 files changed

+136
-97
lines changed

8 files changed

+136
-97
lines changed

docs/integrations/data-ingestion/clickpipes/mysql/faq.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,7 @@ You have several options to resolve these issues:
3434
3. **Configure server certificate** - Update your server's SSL certificate to include all connection hostnames and use a trusted Certificate Authority.
3535

3636
4. **Skip certificate verification** - For self-hosted MySQL or MariaDB, whose default configurations provision a self-signed certificate we can't validate ([MySQL](https://dev.mysql.com/doc/refman/8.4/en/creating-ssl-rsa-files-using-mysql.html#creating-ssl-rsa-files-using-mysql-automatic), [MariaDB](https://mariadb.com/kb/en/securing-connections-for-client-and-server/#enabling-tls-for-mariadb-server)). Relying on this certificate encrypts the data in transit but runs the risk of server impersonation. We recommend properly signed certificates for production environments, but this option is useful for testing on a one-off instance or connecting to legacy infrastructure.
37+
38+
### Do you support schema changes? {#do-you-support-schema-changes}
39+
40+
Please refer to the [ClickPipes for MySQL: Schema Changes Propagation Support](./schema-changes) page for more information.

docs/integrations/data-ingestion/clickpipes/mysql/index.md

Lines changed: 16 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
2-
sidebar_label: 'ClickPipes for MySQL'
2+
sidebar_label: 'Ingesting Data from MySQL to ClickHouse'
33
description: 'Describes how to seamlessly connect your MySQL to ClickHouse Cloud.'
44
slug: /integrations/clickpipes/mysql
5-
title: 'Ingesting Data from MySQL to ClickHouse (using CDC)'
5+
title: 'Ingesting data from MySQL to ClickHouse (using CDC)'
66
---
77

88
import BetaBadge from '@theme/badges/BetaBadge';
@@ -15,20 +15,15 @@ import select_destination_db from '@site/static/images/integrations/data-ingesti
1515
import ch_permissions from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ch-permissions.jpg'
1616
import Image from '@theme/IdealImage';
1717

18-
# Ingesting data from MySQL to ClickHouse using CDC
18+
# Ingesting data from MySQL to ClickHouse (using CDC)
1919

2020
<BetaBadge/>
2121

22-
:::info
23-
Currently, ingesting data from MySQL to ClickHouse Cloud via ClickPipes is in Private Preview.
24-
:::
25-
26-
27-
You can use ClickPipes to ingest data from your source MySQL database into ClickHouse Cloud. The source MySQL database can be hosted on-premises or in the cloud.
22+
You can use ClickPipes to ingest data from your source MySQL database into ClickHouse Cloud. The source MySQL database can be hosted on-premises or in the cloud using services like Amazon RDS, Google Cloud SQL, and others.
2823

2924
## Prerequisites {#prerequisites}
3025

31-
To get started, you first need to make sure that your MySQL database is set up correctly. Depending on your source MySQL instance, you may follow any of the following guides:
26+
To get started, you first need to ensure that your MySQL database is correctly configured for binlog replication. The configuration steps depend on how you're deploying MySQL, so please follow the relevant guide below:
3227

3328
1. [Amazon RDS MySQL](./mysql/source/rds)
3429

@@ -44,7 +39,7 @@ To get started, you first need to make sure that your MySQL database is set up c
4439

4540
Once your source MySQL database is set up, you can continue creating your ClickPipe.
4641

47-
## Create your ClickPipe {#creating-your-clickpipe}
42+
## Create your ClickPipe {#create-your-clickpipe}
4843

4944
Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
5045

@@ -61,20 +56,18 @@ Make sure you are logged in to your ClickHouse Cloud account. If you don't have
6156

6257
<Image img={mysql_tile} alt="Select MySQL" size="lg" border/>
6358

64-
### Add your source MySQL database connection {#adding-your-source-mysql-database-connection}
59+
### Add your source MySQL database connection {#add-your-source-mysql-database-connection}
6560

6661
4. Fill in the connection details for your source MySQL database which you configured in the prerequisites step.
6762

6863
:::info
69-
7064
Before you start adding your connection details make sure that you have whitelisted ClickPipes IP addresses in your firewall rules. On the following page you can find a [list of ClickPipes IP addresses](../index.md#list-of-static-ips).
7165
For more information refer to the source MySQL setup guides linked at [the top of this page](#prerequisites).
72-
7366
:::
7467

7568
<Image img={mysql_connection_details} alt="Fill in connection details" size="lg" border/>
7669

77-
#### (Optional) Set up SSH tunneling {#optional-setting-up-ssh-tunneling}
70+
#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}
7871

7972
You can specify SSH tunneling details if your source MySQL database is not publicly accessible.
8073

@@ -88,12 +81,10 @@ You can specify SSH tunneling details if your source MySQL database is not publi
8881
4. Click on "Verify Connection" to verify the connection.
8982

9083
:::note
91-
9284
Make sure to whitelist [ClickPipes IP addresses](../clickpipes#list-of-static-ips) in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel.
93-
9485
:::
9586

96-
Once the connection details are filled in, click on "Next".
87+
Once the connection details are filled in, click `Next`.
9788

9889
#### Configure advanced settings {#advanced-settings}
9990

@@ -106,7 +97,7 @@ You can configure the advanced settings if needed. A brief description of each s
10697
- **Snapshot number of tables in parallel**: This is the number of tables that will be fetched in parallel during the initial snapshot. This is useful when you have a large number of tables and you want to control the number of tables fetched in parallel.
10798

10899

109-
### Configure the tables {#configuring-the-tables}
100+
### Configure the tables {#configure-the-tables}
110101

111102
5. Here you can select the destination database for your ClickPipe. You can either select an existing database or create a new one.
112103

@@ -121,3 +112,9 @@ You can configure the advanced settings if needed. A brief description of each s
121112
<Image img={ch_permissions} alt="Review permissions" size="lg" border/>
122113

123114
Finally, please refer to the ["ClickPipes for MySQL FAQ"](/integrations/clickpipes/mysql/faq) page for more information about common issues and how to resolve them.
115+
116+
## What's next? {#whats-next}
117+
118+
[//]: # "TODO Write a MySQL-specific migration guide and best practices similar to the existing one for PostgreSQL. The current migration guide points to the MySQL table engine, which is not ideal."
119+
120+
Once you've set up your ClickPipe to replicate data from MySQL to ClickHouse Cloud, you can focus on how to query and model your data for optimal performance. For common questions around MySQL CDC and troubleshooting, see the [MySQL FAQs page](/integrations/data-ingestion/clickpipes/mysql/faq.md).
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: 'Schema Changes Propagation Support'
3+
slug: /integrations/clickpipes/mysql/schema-changes
4+
description: 'Page describing schema change types detectable by ClickPipes in the source tables'
5+
---
6+
7+
ClickPipes for MySQL can detect schema changes in the source tables and, in some cases, automatically propagate the changes to the destination tables. The way each DDL operation is handled is documented below:
8+
9+
[//]: # "TODO Extend this page with behavior on rename, data type changes, and truncate + guidance on how to handle incompatible schema changes."
10+
11+
| Schema Change Type | Behaviour |
12+
| ----------------------------------------------------------------------------------- | ------------------------------------- |
13+
| Adding a new column (`ALTER TABLE ADD COLUMN ...`) | Propagated automatically. The new column(s) will be populated for all rows replicated after the schema change |
14+
| Adding a new column with a default value (`ALTER TABLE ADD COLUMN ... DEFAULT ...`) | Propagated automatically. The new column(s) will be populated for all rows replicated after the schema change, but existing rows will not show the default value without a full table refresh |
15+
| Dropping an existing column (`ALTER TABLE DROP COLUMN ...`) | Detected, but **not** propagated. The dropped column(s) will be populated with `NULL` for all rows replicated after the schema change |

docs/integrations/data-ingestion/clickpipes/mysql/source/aurora.md

Lines changed: 48 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -19,83 +19,91 @@ import Image from '@theme/IdealImage';
1919

2020
# Aurora MySQL source setup guide
2121

22-
This is a step-by-step guide on how to configure your Aurora MySQL instance for replicating its data via the MySQL ClickPipe.
23-
<br/>
24-
:::info
25-
We also recommend going through the MySQL FAQs [here](/integrations/data-ingestion/clickpipes/mysql/faq.md). The FAQs page is being actively updated.
26-
:::
22+
This step-by-step guide shows you how to configure Amazon Aurora MySQL to replicate data into ClickHouse Cloud using the [MySQL ClickPipe](../index.md). For common questions around MySQL CDC, see the [MySQL FAQs page](/integrations/data-ingestion/clickpipes/mysql/faq.md).
2723

2824
## Enable binary log retention {#enable-binlog-retention-aurora}
29-
The binary log is a set of log files that contain information about data modifications made to an MySQL server instance, and binary log files are required for replication. Both of the below steps must be followed:
3025

31-
### 1. Enable binary logging via automated backup {#enable-binlog-logging-aurora}
32-
The automated backups feature determines whether binary logging is turned on or off for MySQL. It can be set in the AWS console:
26+
The binary log is a set of log files that contain information about data modifications made to a MySQL server instance, and binary log files are required for replication. To configure binary log retention in Aurora MySQL, you must [enable binary logging](#enable-binlog-logging) and [increase the binlog retention interval](#binlog-retention-interval).
27+
28+
### 1. Enable binary logging via automated backup {#enable-binlog-logging}
29+
30+
The automated backups feature determines whether binary logging is turned on or off for MySQL. Automated backups can be configured for your instance in the RDS Console by navigating to **Modify** > **Additional configuration** > **Backup** and selecting the **Enable automated backups** checkbox (if not selected already).
3331

3432
<Image img={rds_backups} alt="Enabling automated backups in Aurora" size="lg" border/>
3533

36-
Setting backup retention to a reasonably long value depending on the replication use-case is advisable.
34+
We recommend setting the **Backup retention period** to a reasonably long value, depending on the replication use case.
35+
36+
### 2. Increase the binlog retention interval {#binlog-retention-interval}
37+
38+
:::warning
39+
If ClickPipes tries to resume replication and the required binlog files have been purged due to the configured binlog retention value, the ClickPipe will enter an errored state and a resync is required.
40+
:::
41+
42+
By default, Aurora MySQL purges the binary log as soon as possible (i.e., _lazy purging_). We recommend increasing the binlog retention interval to at least **72 hours** to ensure availability of binary log files for replication under failure scenarios. To set an interval for binary log retention ([`binlog retention hours`](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/mysql-stored-proc-configuring.html#mysql_rds_set_configuration-usage-notes.binlog-retention-hours)), use the [`mysql.rds_set_configuration`](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/mysql-stored-proc-configuring.html#mysql_rds_set_configuration) procedure:
3743

38-
### 2. Binlog retention hours {#binlog-retention-hours-aurora}
39-
The procedure below must be called to ensure availability of binary logs for replication:
44+
[//]: # "NOTE Most CDC providers recommend the maximum retention period for Aurora RDS (7 days/168 hours). Since this has an impact on disk usage, we conservatively recommend a mininum of 3 days/72 hours."
4045

4146
```text
42-
mysql=> call mysql.rds_set_configuration('binlog retention hours', 24);
47+
mysql=> call mysql.rds_set_configuration('binlog retention hours', 72);
4348
```
44-
If this configuration isn't set, Amazon RDS purges the binary logs as soon as possible, leading to gaps in the binary logs.
4549

46-
## Configure binlog settings in the parameter group {#binlog-parameter-group-aurora}
50+
If this configuration isn't set or is set to a low interval, it can lead to gaps in the binary logs, compromising ClickPipes' ability to resume replication.
51+
52+
## Configure binlog settings {#binlog-settings}
4753

48-
The parameter group can be found when you click on your MySQL instance in the RDS Console, and then heading over to the `Configurations` tab.
54+
The parameter group can be found when you click on your MySQL instance in the RDS Console, and then navigate to the **Configuration** tab.
55+
56+
:::tip
57+
If you have a MySQL cluster, the parameters below can be found in the [DB cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_WorkingWithParamGroups.CreatingCluster.html) parameter group instead of the DB instance group.
58+
:::
4959

5060
<Image img={aurora_config} alt="Where to find parameter group in Aurora" size="lg" border/>
5161

52-
Upon clicking on the parameter group link, you will be taken to the page for it. You will see an Edit button in the top-right.
62+
<br/>
63+
Click the parameter group link, which will take you to its dedicated page. You should see an **Edit** button in the top right.
5364

5465
<Image img={edit_button} alt="Edit parameter group" size="lg" border/>
5566

56-
The following settings need to be set as follows:
67+
<br/>
68+
The following parameters need to be set as follows:
5769

5870
1. `binlog_format` to `ROW`.
5971

6072
<Image img={binlog_format} alt="Binlog format to ROW" size="lg" border/>
6173

62-
2. `binlog_row_metadata` to `FULL`
74+
2. `binlog_row_metadata` to `FULL`.
6375

6476
<Image img={binlog_row_metadata} alt="Binlog row metadata" size="lg" border/>
6577

66-
3. `binlog_row_image` to `FULL`
78+
3. `binlog_row_image` to `FULL`.
6779

6880
<Image img={binlog_row_image} alt="Binlog row image" size="lg" border/>
6981

70-
Then click on `Save Changes` in the top-right. You may need to reboot your instance for the changes to take effect - a way of knowing this is if you see `Pending reboot` next to the parameter group link in the Configurations tab of the RDS instance.
7182
<br/>
83+
Then, click on **Save Changes** in the top right corner. You may need to reboot your instance for the changes to take effect — a way of knowing this is if you see `Pending reboot` next to the parameter group link in the **Configuration** tab of the Aurora instance.
84+
85+
## Enable GTID mode (recommended) {#gtid-mode}
86+
7287
:::tip
73-
If you have a MySQL cluster, the above parameters would be found in a [DB Cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_WorkingWithParamGroups.CreatingCluster.html) parameter group and not the DB instance group.
88+
The MySQL ClickPipe also supports replication without GTID mode. However, enabling GTID mode is recommended for better performance and easier troubleshooting.
7489
:::
7590

76-
## Enabling GTID mode {#gtid-mode-aurora}
77-
Global Transaction Identifiers (GTIDs) are unique IDs assigned to each committed transaction in MySQL. They simplify binlog replication and make troubleshooting more straightforward.
91+
[Global Transaction Identifiers (GTIDs)](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) are unique IDs assigned to each committed transaction in MySQL. They simplify binlog replication and make troubleshooting more straightforward. We **recommend** enabling GTID mode, so that the MySQL ClickPipe can use GTID-based replication.
7892

79-
If your MySQL instance is MySQL 5.7, 8.0 or 8.4, we recommend enabling GTID mode so that the MySQL ClickPipe can use GTID replication.
93+
GTID-based replication is supported for Amazon Aurora MySQL v2 (MySQL 5.7) and v3 (MySQL 8.0), as well as Aurora Serverless v2. To enable GTID mode for your Aurora MySQL instance, follow these steps:
8094

81-
To enable GTID mode for your MySQL instance, follow the steps as follows:
8295
1. In the RDS Console, click on your MySQL instance.
83-
2. Click on the `Configurations` tab.
96+
2. Click on the **Configuration** tab.
8497
3. Click on the parameter group link.
85-
4. Click on the `Edit` button in the top-right corner.
98+
4. Click on the **Edit** button in the top right corner.
8699
5. Set `enforce_gtid_consistency` to `ON`.
87100
6. Set `gtid-mode` to `ON`.
88-
7. Click on `Save Changes` in the top-right corner.
101+
7. Click on **Save Changes** in the top right corner.
89102
8. Reboot your instance for the changes to take effect.
90103

91104
<Image img={enable_gtid} alt="GTID enabled" size="lg" border/>
92105

93-
<br/>
94-
:::info
95-
The MySQL ClickPipe also supports replication without GTID mode. However, enabling GTID mode is recommended for better performance and easier troubleshooting.
96-
:::
97-
98-
## Configure a database user {#configure-database-user-aurora}
106+
## Configure a database user {#configure-database-user}
99107

100108
Connect to your Aurora MySQL instance as an admin user and execute the following commands:
101109

@@ -122,12 +130,16 @@ Connect to your Aurora MySQL instance as an admin user and execute the following
122130

123131
### IP-based access control {#ip-based-access-control}
124132

125-
If you want to restrict traffic to your Aurora instance, please add the [documented static NAT IPs](../../index.md#list-of-static-ips) to the `Inbound rules` of your Aurora security group as shown below:
133+
To restrict traffic to your Aurora MySQL instance, add the [documented static NAT IPs](../../index.md#list-of-static-ips) to the **Inbound rules** of your Aurora security group.
126134

127135
<Image img={security_group_in_rds_mysql} alt="Where to find security group in Aurora MySQL?" size="lg" border/>
128136

129137
<Image img={edit_inbound_rules} alt="Edit inbound rules for the above security group" size="lg" border/>
130138

131139
### Private access via AWS PrivateLink {#private-access-via-aws-privatelink}
132140

133-
To connect to your Aurora instance through a private network, you can use AWS PrivateLink. Follow our [AWS PrivateLink setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup-for-clickpipes) to set up the connection.
141+
To connect to your Aurora MySQL instance through a private network, you can use AWS PrivateLink. Follow the [AWS PrivateLink setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup-for-clickpipes) to set up the connection.
142+
143+
## What's next? {#whats-next}
144+
145+
Now that your Amazon Aurora MySQL instance is configured for binlog replication and securely connecting to ClickHouse Cloud, you can [create your first MySQL ClickPipe](/integrations/clickpipes/mysql/#create-your-clickpipe). For common questions around MySQL CDC, see the [MySQL FAQs page](/integrations/data-ingestion/clickpipes/mysql/faq.md).

0 commit comments

Comments
 (0)