Skip to content

Commit 94d6113

Browse files
committed
fix feedback and remove ms quote
1 parent ce14a70 commit 94d6113

File tree

93 files changed

+375
-376
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+375
-376
lines changed

docs/about-us/adopters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ sidebar_position: 60
66
description: 'A list of companies using ClickHouse and their success stories'
77
---
88

9-
The following list of companies using ClickHouse and their success stories is assembled from public sources, thus might differ from current reality. Wed appreciate it if you share the story of adopting ClickHouse in your company and [add it to the list](https://github.com/ClickHouse/clickhouse-docs/blob/main/docs/about-us/adopters.md), but please make sure you wont have any NDA issues by doing so. Providing updates with publications from other companies is also useful.
9+
The following list of companies using ClickHouse and their success stories is assembled from public sources, thus might differ from current reality. We'd appreciate it if you share the story of adopting ClickHouse in your company and [add it to the list](https://github.com/ClickHouse/clickhouse-docs/blob/main/docs/about-us/adopters.md), but please make sure you won't have any NDA issues by doing so. Providing updates with publications from other companies is also useful.
1010

1111
<div class="adopters-table">
1212

docs/about-us/distinctive-features.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ ClickHouse provides various ways to trade accuracy for performance:
7878

7979
## Adaptive Join Algorithm {#adaptive-join-algorithm}
8080

81-
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if theres more than one large table.
81+
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table.
8282

8383
## Data Replication and Data Integrity Support {#data-replication-and-data-integrity-support}
8484

docs/about-us/history.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ There is a widespread opinion that to calculate statistics effectively, you must
3737
However data aggregation comes with a lot of limitations:
3838

3939
- You must have a pre-defined list of required reports.
40-
- The user cant make custom reports.
40+
- The user can't make custom reports.
4141
- When aggregating over a large number of distinct keys, the data volume is barely reduced, so aggregation is useless.
4242
- For a large number of reports, there are too many aggregation variations (combinatorial explosion).
4343
- When aggregating keys with high cardinality (such as URLs), the volume of data is not reduced by much (less than twofold).

docs/about-us/support.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Customers can only log Severity 3 tickets for single replica services across tie
2828
You can also subscribe to our [status page](https://status.clickhouse.com) to get notified quickly about any incidents affecting our platform.
2929

3030
:::note
31-
Please note that only Subscription Customers have a Service Level Agreement on Support Incidents. If you are not currently a ClickHouse Cloud user – while we will try to answer your question, wed encourage you to go instead to one of our Community resources:
31+
Please note that only Subscription Customers have a Service Level Agreement on Support Incidents. If you are not currently a ClickHouse Cloud user – while we will try to answer your question, we'd encourage you to go instead to one of our Community resources:
3232

3333
- [ClickHouse Community Slack Channel](https://clickhouse.com/slack)
3434
- [Other Community Options](https://github.com/ClickHouse/ClickHouse/blob/master/README.md#useful-links)

docs/architecture/cluster-deployment.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ slug: /architecture/cluster-deployment
33
sidebar_label: 'Cluster Deployment'
44
sidebar_position: 100
55
title: 'Cluster Deployment'
6-
description: 'By going through this tutorial, youll learn how to set up a simple ClickHouse cluster.'
6+
description: 'By going through this tutorial, you'll learn how to set up a simple ClickHouse cluster.'
77
---
88

99
This tutorial assumes you've already set up a [local ClickHouse server](../getting-started/install.md)
1010

11-
By going through this tutorial, youll learn how to set up a simple ClickHouse cluster. Itll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries.
11+
By going through this tutorial, you'll learn how to set up a simple ClickHouse cluster. It'll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries.
1212

1313
## Cluster Deployment {#cluster-deployment}
1414

@@ -19,7 +19,7 @@ This ClickHouse cluster will be a homogeneous cluster. Here are the steps:
1919
3. Create local tables on each instance
2020
4. Create a [Distributed table](../engines/table-engines/special/distributed.md)
2121

22-
A [distributed table](../engines/table-engines/special/distributed.md) is a kind of "view" to the local tables in a ClickHouse cluster. A SELECT query from a distributed table executes using resources of all clusters shards. You may specify configs for multiple clusters and create multiple distributed tables to provide views for different clusters.
22+
A [distributed table](../engines/table-engines/special/distributed.md) is a kind of "view" to the local tables in a ClickHouse cluster. A SELECT query from a distributed table executes using resources of all cluster's shards. You may specify configs for multiple clusters and create multiple distributed tables to provide views for different clusters.
2323

2424
Here is an example config for a cluster with three shards, with one replica each:
2525

@@ -48,7 +48,7 @@ Here is an example config for a cluster with three shards, with one replica each
4848
</remote_servers>
4949
```
5050

51-
For further demonstration, lets create a new local table with the same `CREATE TABLE` query that we used for `hits_v1` in the single node deployment tutorial, but with a different table name:
51+
For further demonstration, let's create a new local table with the same `CREATE TABLE` query that we used for `hits_v1` in the single node deployment tutorial, but with a different table name:
5252

5353
```sql
5454
CREATE TABLE tutorial.hits_local (...) ENGINE = MergeTree() ...
@@ -63,7 +63,7 @@ ENGINE = Distributed(perftest_3shards_1replicas, tutorial, hits_local, rand());
6363

6464
A common practice is to create similar distributed tables on all machines of the cluster. This allows running distributed queries on any machine of the cluster. There's also an alternative option to create a temporary distributed table for a given SELECT query using [remote](../sql-reference/table-functions/remote.md) table function.
6565

66-
Lets run [INSERT SELECT](../sql-reference/statements/insert-into.md) into the distributed table to spread the table to multiple servers.
66+
Let's run [INSERT SELECT](../sql-reference/statements/insert-into.md) into the distributed table to spread the table to multiple servers.
6767

6868
```sql
6969
INSERT INTO tutorial.hits_all SELECT * FROM tutorial.hits_v1;
@@ -99,10 +99,10 @@ Here is an example config for a cluster of one shard containing three replicas:
9999
</remote_servers>
100100
```
101101

102-
To enable native replication [ZooKeeper](http://zookeeper.apache.org/), is required. ClickHouse takes care of data consistency on all replicas and runs a restore procedure after a failure automatically. Its recommended to deploy the ZooKeeper cluster on separate servers (where no other processes including ClickHouse are running).
102+
To enable native replication [ZooKeeper](http://zookeeper.apache.org/), is required. ClickHouse takes care of data consistency on all replicas and runs a restore procedure after a failure automatically. It's recommended to deploy the ZooKeeper cluster on separate servers (where no other processes including ClickHouse are running).
103103

104104
:::note Note
105-
ZooKeeper is not a strict requirement: in some simple cases, you can duplicate the data by writing it into all the replicas from your application code. This approach is **not** recommended, as in this case, ClickHouse wont be able to guarantee data consistency on all replicas. Thus, it becomes the responsibility of your application.
105+
ZooKeeper is not a strict requirement: in some simple cases, you can duplicate the data by writing it into all the replicas from your application code. This approach is **not** recommended, as in this case, ClickHouse won't be able to guarantee data consistency on all replicas. Thus, it becomes the responsibility of your application.
106106
:::
107107

108108
ZooKeeper locations are specified in the configuration file:

docs/best-practices/_snippets/_async_inserts.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import Image from '@theme/IdealImage';
22
import async_inserts from '@site/static/images/bestpractices/async_inserts.png';
33

4-
Asynchronous inserts in ClickHouse provide a powerful alternative when client-side batching isnt feasible. This is especially valuable in observability workloads, where hundreds or thousands of agents send data continuously - logs, metrics, traces - often in small, real-time payloads. Buffering data client-side in these environments increases complexity, requiring a centralized queue to ensure sufficiently large batches can be sent.
4+
Asynchronous inserts in ClickHouse provide a powerful alternative when client-side batching isn't feasible. This is especially valuable in observability workloads, where hundreds or thousands of agents send data continuously - logs, metrics, traces - often in small, real-time payloads. Buffering data client-side in these environments increases complexity, requiring a centralized queue to ensure sufficiently large batches can be sent.
55

66
:::note
77
Sending many small batches in synchronous mode is not recommended, leading to many parts being created. This will lead to poor query performance and ["too many part"](/knowledgebase/exception-too-many-parts) errors.
@@ -27,11 +27,11 @@ The behavior of asynchronous inserts is further refined using the [`wait_for_asy
2727

2828
When set to 1 (the default), ClickHouse only acknowledges the insert after the data is successfully flushed to disk. This ensures strong durability guarantees and makes error handling straightforward: if something goes wrong during the flush, the error is returned to the client. This mode is recommended for most production scenarios, especially when insert failures must be tracked reliably.
2929

30-
[Benchmarks](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) show it scales well with concurrency - whether youre running 200 or 500 clients- thanks to adaptive inserts and stable part creation behavior.
30+
[Benchmarks](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) show it scales well with concurrency - whether you're running 200 or 500 clients- thanks to adaptive inserts and stable part creation behavior.
3131

3232
Setting `wait_for_async_insert = 0` enables "fire-and-forget" mode. Here, the server acknowledges the insert as soon as the data is buffered, without waiting for it to reach storage.
3333

34-
This offers ultra-low-latency inserts and maximal throughput, ideal for high-velocity, low-criticality data. However, this comes with trade-offs: theres no guarantee the data will be persisted, errors may only surface during flush, and its difficult to trace failed inserts. Use this mode only if your workload can tolerate data loss.
34+
This offers ultra-low-latency inserts and maximal throughput, ideal for high-velocity, low-criticality data. However, this comes with trade-offs: there's no guarantee the data will be persisted, errors may only surface during flush, and it's difficult to trace failed inserts. Use this mode only if your workload can tolerate data loss.
3535

3636
[Benchmarks also demonstrate](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) substantial part reduction and lower CPU usage when buffer flushes are infrequent (e.g. every 30 seconds), but the risk of silent failure remains.
3737

@@ -41,7 +41,7 @@ Our strong recommendation is to use `async_insert=1,wait_for_async_insert=1` if
4141

4242
By default, ClickHouse performs automatic deduplication for synchronous inserts, which makes retries safe in failure scenarios. However, this is disabled for asynchronous inserts unless explicitly enabled (this should not be enabled if you have dependent materialized views - [see issue](https://github.com/ClickHouse/ClickHouse/issues/66003)).
4343

44-
In practice, if deduplication is turned on and the same insert is retried - due to, for instance, a timeout or network drop - ClickHouse can safely ignore the duplicate. This helps maintain idempotency and avoids double-writing data. Still, its worth noting that insert validation and schema parsing happen only during buffer flush - so errors (like type mismatches) will only surface at that point.
44+
In practice, if deduplication is turned on and the same insert is retried - due to, for instance, a timeout or network drop - ClickHouse can safely ignore the duplicate. This helps maintain idempotency and avoids double-writing data. Still, it's worth noting that insert validation and schema parsing happen only during buffer flush - so errors (like type mismatches) will only surface at that point.
4545

4646
### Enabling asynchronous inserts {#enabling-asynchronous-inserts}
4747

docs/best-practices/_snippets/_avoid_mutations.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
In ClickHouse, **mutations** refer to operations that modify or delete existing data in a table - typically using `ALTER TABLE ... DELETE` or `ALTER TABLE ... UPDATE`. While these statements may appear similar to standard SQL operations, they are fundamentally different under the hood.
22

3-
Rather than modifying rows in place, mutations in ClickHouse are asynchronous background processes that rewrite entire [data parts](/parts) affected by the change. This approach is necessary due to ClickHouses column-oriented, immutable storage model, but it can lead to significant I/O and resource usage.
3+
Rather than modifying rows in place, mutations in ClickHouse are asynchronous background processes that rewrite entire [data parts](/parts) affected by the change. This approach is necessary due to ClickHouse's column-oriented, immutable storage model, but it can lead to significant I/O and resource usage.
44

55
When a mutation is issued, ClickHouse schedules the creation of new **mutated parts**, leaving the original parts untouched until the new ones are ready. Once ready, the mutated parts atomically replace the originals. However, because the operation rewrites entire parts, even a minor change (such as updating a single row) may result in large-scale rewrites and excessive write amplification.
66

7-
For large datasets, this can produce a substantial spike in disk I/O and degrade overall cluster performance. Unlike merges, mutations cant be rolled back once submitted and will continue to execute even after server restarts unless explicitly cancelled - see [`KILL MUTATION`](/sql-reference/statements/kill#kill-mutation).
7+
For large datasets, this can produce a substantial spike in disk I/O and degrade overall cluster performance. Unlike merges, mutations can't be rolled back once submitted and will continue to execute even after server restarts unless explicitly cancelled - see [`KILL MUTATION`](/sql-reference/statements/kill#kill-mutation).
88

99
Mutations are **totally ordered**: they apply to data inserted before the mutation was issued, while newer data remains unaffected. They do not block inserts but can still overlap with other ongoing queries. A SELECT running during a mutation may read a mix of mutated and unmutated parts, which can lead to inconsistent views of the data during execution. ClickHouse executes mutations in parallel per part, which can further intensify memory and CPU usage, especially when complex subqueries (like x IN (SELECT ...)) are involved.
1010

docs/best-practices/_snippets/_avoid_optimize_final.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Over time, background processes merge smaller parts into larger ones to reduce f
1010

1111
<Image img={simple_merges} size="md" alt="Simple merges" />
1212

13-
While its tempting to manually trigger this merge using:
13+
While it's tempting to manually trigger this merge using:
1414

1515
```sql
1616
OPTIMIZE TABLE <table> FINAL;
@@ -20,7 +20,7 @@ OPTIMIZE TABLE <table> FINAL;
2020

2121
## Why Avoid? {#why-avoid}
2222

23-
### Its expensive {#its-expensive}
23+
### It's expensive {#its-expensive}
2424

2525
Running `OPTIMIZE FINAL` forces ClickHouse to merge **all** active parts into a **single part**, even if large merges have already occurred. This involves:
2626

@@ -41,4 +41,4 @@ Normally, ClickHouse avoids merging parts larger than ~150 GB (configurable via
4141

4242
## Let background merges do the work {#let-background-merges-do-the-work}
4343

44-
ClickHouse already performs smart background merges to optimize storage and query efficiency. These are incremental, resource-aware, and respect configured thresholds. Unless you have a very specific need (e.g., finalizing data before freezing a table or exporting), **youre better off letting ClickHouse manage merges on its own**.
44+
ClickHouse already performs smart background merges to optimize storage and query efficiency. These are incremental, resource-aware, and respect configured thresholds. Unless you have a very specific need (e.g., finalizing data before freezing a table or exporting), **you're better off letting ClickHouse manage merges on its own**.

docs/best-practices/_snippets/_bulk_inserts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ We recommend inserting data in batches of at least 1,000 rows, and ideally betwe
44

55
**For a synchronous insert strategy to be effective this client-side batching is required.**
66

7-
If youre unable to batch data client-side, ClickHouse supports asynchronous inserts that shift batching to the server ([see](/best-practices/selecting-an-insert-strategy#asynchronous-inserts)).
7+
If you're unable to batch data client-side, ClickHouse supports asynchronous inserts that shift batching to the server ([see](/best-practices/selecting-an-insert-strategy#asynchronous-inserts)).
88

99
:::tip
1010
Regardless of the size of your inserts, we recommend keeping the number of insert queries around one insert query per second. The reason for that recommendation is that the created parts are merged to larger parts in the background (in order to optimize your data for read queries), and sending too many insert queries per second can lead to situations where the background merging can't keep up with the number of new parts. However, you can use a higher rate of insert queries per second when you use asynchronous inserts (see asynchronous inserts).

0 commit comments

Comments
 (0)