You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/about-us/adopters.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ sidebar_position: 60
6
6
description: 'A list of companies using ClickHouse and their success stories'
7
7
---
8
8
9
-
The following list of companies using ClickHouse and their success stories is assembled from public sources, thus might differ from current reality. We’d appreciate it if you share the story of adopting ClickHouse in your company and [add it to the list](https://github.com/ClickHouse/clickhouse-docs/blob/main/docs/about-us/adopters.md), but please make sure you won’t have any NDA issues by doing so. Providing updates with publications from other companies is also useful.
9
+
The following list of companies using ClickHouse and their success stories is assembled from public sources, thus might differ from current reality. We'd appreciate it if you share the story of adopting ClickHouse in your company and [add it to the list](https://github.com/ClickHouse/clickhouse-docs/blob/main/docs/about-us/adopters.md), but please make sure you won't have any NDA issues by doing so. Providing updates with publications from other companies is also useful.
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there’s more than one large table.
81
+
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table.
82
82
83
83
## Data Replication and Data Integrity Support {#data-replication-and-data-integrity-support}
Copy file name to clipboardExpand all lines: docs/about-us/support.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ Customers can only log Severity 3 tickets for single replica services across tie
28
28
You can also subscribe to our [status page](https://status.clickhouse.com) to get notified quickly about any incidents affecting our platform.
29
29
30
30
:::note
31
-
Please note that only Subscription Customers have a Service Level Agreement on Support Incidents. If you are not currently a ClickHouse Cloud user – while we will try to answer your question, we’d encourage you to go instead to one of our Community resources:
31
+
Please note that only Subscription Customers have a Service Level Agreement on Support Incidents. If you are not currently a ClickHouse Cloud user – while we will try to answer your question, we'd encourage you to go instead to one of our Community resources:
32
32
33
33
-[ClickHouse Community Slack Channel](https://clickhouse.com/slack)
34
34
-[Other Community Options](https://github.com/ClickHouse/ClickHouse/blob/master/README.md#useful-links)
description: 'By going through this tutorial, you’ll learn how to set up a simple ClickHouse cluster.'
6
+
description: 'By going through this tutorial, you'll learn how to set up a simple ClickHouse cluster.'
7
7
---
8
8
9
9
This tutorial assumes you've already set up a [local ClickHouse server](../getting-started/install.md)
10
10
11
-
By going through this tutorial, you’ll learn how to set up a simple ClickHouse cluster. It’ll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries.
11
+
By going through this tutorial, you'll learn how to set up a simple ClickHouse cluster. It'll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries.
12
12
13
13
## Cluster Deployment {#cluster-deployment}
14
14
@@ -19,7 +19,7 @@ This ClickHouse cluster will be a homogeneous cluster. Here are the steps:
19
19
3. Create local tables on each instance
20
20
4. Create a [Distributed table](../engines/table-engines/special/distributed.md)
21
21
22
-
A [distributed table](../engines/table-engines/special/distributed.md) is a kind of "view" to the local tables in a ClickHouse cluster. A SELECT query from a distributed table executes using resources of all cluster’s shards. You may specify configs for multiple clusters and create multiple distributed tables to provide views for different clusters.
22
+
A [distributed table](../engines/table-engines/special/distributed.md) is a kind of "view" to the local tables in a ClickHouse cluster. A SELECT query from a distributed table executes using resources of all cluster's shards. You may specify configs for multiple clusters and create multiple distributed tables to provide views for different clusters.
23
23
24
24
Here is an example config for a cluster with three shards, with one replica each:
25
25
@@ -48,7 +48,7 @@ Here is an example config for a cluster with three shards, with one replica each
48
48
</remote_servers>
49
49
```
50
50
51
-
For further demonstration, let’s create a new local table with the same `CREATE TABLE` query that we used for `hits_v1` in the single node deployment tutorial, but with a different table name:
51
+
For further demonstration, let's create a new local table with the same `CREATE TABLE` query that we used for `hits_v1` in the single node deployment tutorial, but with a different table name:
A common practice is to create similar distributed tables on all machines of the cluster. This allows running distributed queries on any machine of the cluster. There's also an alternative option to create a temporary distributed table for a given SELECT query using [remote](../sql-reference/table-functions/remote.md) table function.
65
65
66
-
Let’s run [INSERT SELECT](../sql-reference/statements/insert-into.md) into the distributed table to spread the table to multiple servers.
66
+
Let's run [INSERT SELECT](../sql-reference/statements/insert-into.md) into the distributed table to spread the table to multiple servers.
@@ -99,10 +99,10 @@ Here is an example config for a cluster of one shard containing three replicas:
99
99
</remote_servers>
100
100
```
101
101
102
-
To enable native replication [ZooKeeper](http://zookeeper.apache.org/), is required. ClickHouse takes care of data consistency on all replicas and runs a restore procedure after a failure automatically. It’s recommended to deploy the ZooKeeper cluster on separate servers (where no other processes including ClickHouse are running).
102
+
To enable native replication [ZooKeeper](http://zookeeper.apache.org/), is required. ClickHouse takes care of data consistency on all replicas and runs a restore procedure after a failure automatically. It's recommended to deploy the ZooKeeper cluster on separate servers (where no other processes including ClickHouse are running).
103
103
104
104
:::note Note
105
-
ZooKeeper is not a strict requirement: in some simple cases, you can duplicate the data by writing it into all the replicas from your application code. This approach is **not** recommended, as in this case, ClickHouse won’t be able to guarantee data consistency on all replicas. Thus, it becomes the responsibility of your application.
105
+
ZooKeeper is not a strict requirement: in some simple cases, you can duplicate the data by writing it into all the replicas from your application code. This approach is **not** recommended, as in this case, ClickHouse won't be able to guarantee data consistency on all replicas. Thus, it becomes the responsibility of your application.
106
106
:::
107
107
108
108
ZooKeeper locations are specified in the configuration file:
Copy file name to clipboardExpand all lines: docs/best-practices/_snippets/_async_inserts.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
import Image from '@theme/IdealImage';
2
2
import async_inserts from '@site/static/images/bestpractices/async_inserts.png';
3
3
4
-
Asynchronous inserts in ClickHouse provide a powerful alternative when client-side batching isn’t feasible. This is especially valuable in observability workloads, where hundreds or thousands of agents send data continuously - logs, metrics, traces - often in small, real-time payloads. Buffering data client-side in these environments increases complexity, requiring a centralized queue to ensure sufficiently large batches can be sent.
4
+
Asynchronous inserts in ClickHouse provide a powerful alternative when client-side batching isn't feasible. This is especially valuable in observability workloads, where hundreds or thousands of agents send data continuously - logs, metrics, traces - often in small, real-time payloads. Buffering data client-side in these environments increases complexity, requiring a centralized queue to ensure sufficiently large batches can be sent.
5
5
6
6
:::note
7
7
Sending many small batches in synchronous mode is not recommended, leading to many parts being created. This will lead to poor query performance and ["too many part"](/knowledgebase/exception-too-many-parts) errors.
@@ -27,11 +27,11 @@ The behavior of asynchronous inserts is further refined using the [`wait_for_asy
27
27
28
28
When set to 1 (the default), ClickHouse only acknowledges the insert after the data is successfully flushed to disk. This ensures strong durability guarantees and makes error handling straightforward: if something goes wrong during the flush, the error is returned to the client. This mode is recommended for most production scenarios, especially when insert failures must be tracked reliably.
29
29
30
-
[Benchmarks](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) show it scales well with concurrency - whether you’re running 200 or 500 clients- thanks to adaptive inserts and stable part creation behavior.
30
+
[Benchmarks](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) show it scales well with concurrency - whether you're running 200 or 500 clients- thanks to adaptive inserts and stable part creation behavior.
31
31
32
32
Setting `wait_for_async_insert = 0` enables "fire-and-forget" mode. Here, the server acknowledges the insert as soon as the data is buffered, without waiting for it to reach storage.
33
33
34
-
This offers ultra-low-latency inserts and maximal throughput, ideal for high-velocity, low-criticality data. However, this comes with trade-offs: there’s no guarantee the data will be persisted, errors may only surface during flush, and it’s difficult to trace failed inserts. Use this mode only if your workload can tolerate data loss.
34
+
This offers ultra-low-latency inserts and maximal throughput, ideal for high-velocity, low-criticality data. However, this comes with trade-offs: there's no guarantee the data will be persisted, errors may only surface during flush, and it's difficult to trace failed inserts. Use this mode only if your workload can tolerate data loss.
35
35
36
36
[Benchmarks also demonstrate](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) substantial part reduction and lower CPU usage when buffer flushes are infrequent (e.g. every 30 seconds), but the risk of silent failure remains.
37
37
@@ -41,7 +41,7 @@ Our strong recommendation is to use `async_insert=1,wait_for_async_insert=1` if
41
41
42
42
By default, ClickHouse performs automatic deduplication for synchronous inserts, which makes retries safe in failure scenarios. However, this is disabled for asynchronous inserts unless explicitly enabled (this should not be enabled if you have dependent materialized views - [see issue](https://github.com/ClickHouse/ClickHouse/issues/66003)).
43
43
44
-
In practice, if deduplication is turned on and the same insert is retried - due to, for instance, a timeout or network drop - ClickHouse can safely ignore the duplicate. This helps maintain idempotency and avoids double-writing data. Still, it’s worth noting that insert validation and schema parsing happen only during buffer flush - so errors (like type mismatches) will only surface at that point.
44
+
In practice, if deduplication is turned on and the same insert is retried - due to, for instance, a timeout or network drop - ClickHouse can safely ignore the duplicate. This helps maintain idempotency and avoids double-writing data. Still, it's worth noting that insert validation and schema parsing happen only during buffer flush - so errors (like type mismatches) will only surface at that point.
Copy file name to clipboardExpand all lines: docs/best-practices/_snippets/_avoid_mutations.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
In ClickHouse, **mutations** refer to operations that modify or delete existing data in a table - typically using `ALTER TABLE ... DELETE` or `ALTER TABLE ... UPDATE`. While these statements may appear similar to standard SQL operations, they are fundamentally different under the hood.
2
2
3
-
Rather than modifying rows in place, mutations in ClickHouse are asynchronous background processes that rewrite entire [data parts](/parts) affected by the change. This approach is necessary due to ClickHouse’s column-oriented, immutable storage model, but it can lead to significant I/O and resource usage.
3
+
Rather than modifying rows in place, mutations in ClickHouse are asynchronous background processes that rewrite entire [data parts](/parts) affected by the change. This approach is necessary due to ClickHouse's column-oriented, immutable storage model, but it can lead to significant I/O and resource usage.
4
4
5
5
When a mutation is issued, ClickHouse schedules the creation of new **mutated parts**, leaving the original parts untouched until the new ones are ready. Once ready, the mutated parts atomically replace the originals. However, because the operation rewrites entire parts, even a minor change (such as updating a single row) may result in large-scale rewrites and excessive write amplification.
6
6
7
-
For large datasets, this can produce a substantial spike in disk I/O and degrade overall cluster performance. Unlike merges, mutations can’t be rolled back once submitted and will continue to execute even after server restarts unless explicitly cancelled - see [`KILL MUTATION`](/sql-reference/statements/kill#kill-mutation).
7
+
For large datasets, this can produce a substantial spike in disk I/O and degrade overall cluster performance. Unlike merges, mutations can't be rolled back once submitted and will continue to execute even after server restarts unless explicitly cancelled - see [`KILL MUTATION`](/sql-reference/statements/kill#kill-mutation).
8
8
9
9
Mutations are **totally ordered**: they apply to data inserted before the mutation was issued, while newer data remains unaffected. They do not block inserts but can still overlap with other ongoing queries. A SELECT running during a mutation may read a mix of mutated and unmutated parts, which can lead to inconsistent views of the data during execution. ClickHouse executes mutations in parallel per part, which can further intensify memory and CPU usage, especially when complex subqueries (like x IN (SELECT ...)) are involved.
While it’s tempting to manually trigger this merge using:
13
+
While it's tempting to manually trigger this merge using:
14
14
15
15
```sql
16
16
OPTIMIZE TABLE <table> FINAL;
@@ -20,7 +20,7 @@ OPTIMIZE TABLE <table> FINAL;
20
20
21
21
## Why Avoid? {#why-avoid}
22
22
23
-
### It’s expensive {#its-expensive}
23
+
### It's expensive {#its-expensive}
24
24
25
25
Running `OPTIMIZE FINAL` forces ClickHouse to merge **all** active parts into a **single part**, even if large merges have already occurred. This involves:
26
26
@@ -41,4 +41,4 @@ Normally, ClickHouse avoids merging parts larger than ~150 GB (configurable via
41
41
42
42
## Let background merges do the work {#let-background-merges-do-the-work}
43
43
44
-
ClickHouse already performs smart background merges to optimize storage and query efficiency. These are incremental, resource-aware, and respect configured thresholds. Unless you have a very specific need (e.g., finalizing data before freezing a table or exporting), **you’re better off letting ClickHouse manage merges on its own**.
44
+
ClickHouse already performs smart background merges to optimize storage and query efficiency. These are incremental, resource-aware, and respect configured thresholds. Unless you have a very specific need (e.g., finalizing data before freezing a table or exporting), **you're better off letting ClickHouse manage merges on its own**.
Copy file name to clipboardExpand all lines: docs/best-practices/_snippets/_bulk_inserts.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ We recommend inserting data in batches of at least 1,000 rows, and ideally betwe
4
4
5
5
**For a synchronous insert strategy to be effective this client-side batching is required.**
6
6
7
-
If you’re unable to batch data client-side, ClickHouse supports asynchronous inserts that shift batching to the server ([see](/best-practices/selecting-an-insert-strategy#asynchronous-inserts)).
7
+
If you're unable to batch data client-side, ClickHouse supports asynchronous inserts that shift batching to the server ([see](/best-practices/selecting-an-insert-strategy#asynchronous-inserts)).
8
8
9
9
:::tip
10
10
Regardless of the size of your inserts, we recommend keeping the number of insert queries around one insert query per second. The reason for that recommendation is that the created parts are merged to larger parts in the background (in order to optimize your data for read queries), and sending too many insert queries per second can lead to situations where the background merging can't keep up with the number of new parts. However, you can use a higher rate of insert queries per second when you use asynchronous inserts (see asynchronous inserts).
0 commit comments