Skip to content

Commit 8da53c0

Browse files
CopilotWenyXu
andauthored
docs: add PARALLELISM option for COPY DATABASE and CLI tools (#2219)
Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: WenyXu <[email protected]>
1 parent 8601a0f commit 8da53c0

File tree

4 files changed

+40
-22
lines changed
  • docs/reference
  • i18n/zh/docusaurus-plugin-content-docs/current/reference

4 files changed

+40
-22
lines changed

docs/reference/command-lines/utilities/data.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ greptime cli data export [OPTIONS]
2020
| `--addr` | Yes | - | Server address to connect |
2121
| `--output-dir` | Yes | - | Directory to store exported data |
2222
| `--database` | No | all databasses | Name of the database to export |
23-
| `--export-jobs`, `-j` | No | 1 | Number of parallel export jobs(multiple databases can be exported in parallel) |
23+
| `--db-parallelism`, `-j` | No | 1 | Number of databases to export in parallel. For example, if there are 20 databases and `db-parallelism` is set to 4, then 4 databases will be exported concurrently. (alias: `--export-jobs`) |
24+
| `--table-parallelism` | No | 4 | Number of tables to export in parallel within a single database. For example, if a database contains 30 tables and `table-parallelism` is set to 8, then 8 tables will be exported concurrently. |
2425
| `--max-retry` | No | 3 | Maximum retry attempts per job |
2526
| `--target`, `-t` | No | all | Export target (schema/data/all) |
2627
| `--start-time` | No | - | Start of time range for data export |
@@ -56,15 +57,15 @@ greptime cli data import [OPTIONS]
5657
```
5758

5859
### Options
59-
| Option | Required | Default | Description |
60-
| ------------------- | -------- | ------------- | ------------------------------------------------------------------------------- |
61-
| `--addr` | Yes | - | Server address to connect |
62-
| `--input-dir` | Yes | - | Directory containing backup data |
63-
| `--database` | No | all databases | Name of the database to import |
64-
| `--import-jobs, -j` | No | 1 | Number of parallel import jobs (multiple databases can be imported in parallel) |
65-
| `--max-retry` | No | 3 | Maximum retry attempts per job |
66-
| `--target, -t` | No | all | Import target (schema/data/all) |
67-
| `--auth-basic` | No | - | Use the `<username>:<password>` format |
60+
| Option | Required | Default | Description |
61+
| ------------------------ | -------- | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
62+
| `--addr` | Yes | - | Server address to connect |
63+
| `--input-dir` | Yes | - | Directory containing backup data |
64+
| `--database` | No | all databases | Name of the database to import |
65+
| `--db-parallelism`, `-j` | No | 1 | Number of databases to import in parallel. For example, if there are 20 databases and `db-parallelism` is set to 4, then 4 databases will be imported concurrently. (alias: `--import-jobs`) |
66+
| `--max-retry` | No | 3 | Maximum retry attempts per job |
67+
| `--target, -t` | No | all | Import target (schema/data/all) |
68+
| `--auth-basic` | No | - | Use the `<username>:<password>` format |
6869

6970
### Import Targets
7071
- `schema`: Imports table schemas only

docs/reference/sql/copy.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,8 @@ COPY DATABASE <db_name>
188188
WITH (
189189
FORMAT = { 'CSV' | 'JSON' | 'PARQUET' },
190190
START_TIME = "<START TIMESTAMP>",
191-
END_TIME = "<END TIMESTAMP>"
191+
END_TIME = "<END TIMESTAMP>",
192+
PARALLELISM = <number>
192193
)
193194
[CONNECTION(
194195
REGION = "<REGION NAME>",
@@ -203,6 +204,7 @@ COPY DATABASE <db_name>
203204
|---|---|---|
204205
| `FORMAT` | Export file format, available options: JSON, CSV, Parquet | **Required** |
205206
| `START_TIME`/`END_TIME`| The time range within which data should be exported. `START_TIME` is inclusive and `END_TIME` is exclusive. | Optional |
207+
| `PARALLELISM` | Number of tables to process in parallel. For example, if a database contains 30 tables and `PARALLELISM` is set to 8, then 8 tables will be processed concurrently. Defaults to the total number of CPU cores, with a minimum value of 1. | Optional |
206208

207209
> - When copying databases, `<PATH>` must end with `/`.
208210
> - `CONNECTION` parameters can also be used to copying databases to/from object storage services like AWS S3.
@@ -213,11 +215,17 @@ COPY DATABASE <db_name>
213215
-- Export all tables' data to /tmp/export/
214216
COPY DATABASE public TO '/tmp/export/' WITH (FORMAT='parquet');
215217

218+
-- Export all table data using 4 parallel operations
219+
COPY DATABASE public TO '/tmp/export/' WITH (FORMAT='parquet', PARALLELISM=4);
220+
216221
-- Export all tables' data within time range 2022-04-11 08:00:00~2022-04-11 09:00:00 to /tmp/export/
217222
COPY DATABASE public TO '/tmp/export/' WITH (FORMAT='parquet', START_TIME='2022-04-11 08:00:00', END_TIME='2022-04-11 09:00:00');
218223

219224
-- Import files under /tmp/export/ directory to database named public.
220225
COPY DATABASE public FROM '/tmp/export/' WITH (FORMAT='parquet');
226+
227+
-- Import files using 8 parallel operations
228+
COPY DATABASE public FROM '/tmp/export/' WITH (FORMAT='parquet', PARALLELISM=8);
221229
```
222230

223231
## Special reminder for Windows platforms

i18n/zh/docusaurus-plugin-content-docs/current/reference/command-lines/utilities/data.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ greptime cli data export [OPTIONS]
2020
| `--addr` || - | 要连接的 GreptimeDB 数据库地址 |
2121
| `--output-dir` || - | 存储导出数据的目录 |
2222
| `--database` || 所有数据库 | 要导出的数据库名称 |
23-
| `--export-jobs, -j` || 1 | 并行导出任务数量(多个数据库可以并行导出) |
23+
| `--db-parallelism, -j` || 1 | 并行导出的数据库数量。例如,有 20 个数据库且 `db-parallelism` 设置为 4 时,将同时导出 4 个数据库。(别名:`--export-jobs`|
24+
| `--table-parallelism` || 4 | 单个数据库内并行导出的表数量。例如,数据库包含 30 个表且 `table-parallelism` 设置为 8 时,将同时导出 8 个表。 |
2425
| `--max-retry` || 3 | 每个任务的最大重试次数 |
2526
| `--target, -t` || all | 导出目标(schema/data/all) |
2627
| `--start-time` || - | 数据导出的开始时间范围 |
@@ -56,15 +57,15 @@ greptime cli data import [OPTIONS]
5657
```
5758

5859
### 选项
59-
| 选项 | 是否必需 | 默认值 | 描述 |
60-
| ------------------- | -------- | ---------- | ------------------------------------------ |
61-
| `--addr` || - | 要连接的 GreptimeDB 数据库地址 |
62-
| `--input-dir` || - | 包含备份数据的目录 |
63-
| `--database` || 所有数据库 | 要导入的数据库名称 |
64-
| `--import-jobs, -j` || 1 | 并行导入任务数量(多个数据库可以并行导入) |
65-
| `--max-retry` || 3 | 每个任务的最大重试次数 |
66-
| `--target, -t` || all | 导入目标(schema/data/all) |
67-
| `--auth-basic` || - | 使用 `<username>:<password>` 格式 |
60+
| 选项 | 是否必需 | 默认值 | 描述 |
61+
| ------------------------ | -------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------- |
62+
| `--addr` || - | 要连接的 GreptimeDB 数据库地址 |
63+
| `--input-dir` || - | 包含备份数据的目录 |
64+
| `--database` || 所有数据库 | 要导入的数据库名称 |
65+
| `--db-parallelism, -j` || 1 | 并行导入的数据库数量。例如,有 20 个数据库且 `db-parallelism` 设置为 4 时,将同时导入 4 个数据库。(别名:`--import-jobs` |
66+
| `--max-retry` || 3 | 每个任务的最大重试次数 |
67+
| `--target, -t` || all | 导入目标(schema/data/all) |
68+
| `--auth-basic` || - | 使用 `<username>:<password>` 格式 |
6869

6970
### 导入目标
7071
- `schema`: 仅导入表结构

i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,8 @@ COPY DATABASE <db_name>
181181
WITH (
182182
FORMAT = { 'CSV' | 'JSON' | 'PARQUET' }
183183
START_TIME = "<START TIMESTAMP>",
184-
END_TIME = "<END TIMESTAMP>"
184+
END_TIME = "<END TIMESTAMP>",
185+
PARALLELISM = <number>
185186
)
186187
[CONNECTION(
187188
REGION = "<REGION NAME>",
@@ -196,6 +197,7 @@ COPY DATABASE <db_name>
196197
|---|---|---|
197198
| `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet | **** |
198199
| `START_TIME`/`END_TIME`| 需要导出数据的时间范围,时间范围为左闭右开 | 可选 |
200+
| `PARALLELISM` | 并行处理的表数量。例如,数据库包含 30 个表且 `PARALLELISM` 设置为 8 时,将同时处理 8 个表。默认值为 CPU 核心总数,最小值为 1。 | 可选 |
199201

200202
> - 当导入/导出表时,`<PATH>` 参数必须以 `/` 结尾;
201203
> - COPY DATABASE 同样可以通过 `CONNECTION` 参数将数据导入/导出的路径指向 S3 等对象存储
@@ -207,11 +209,17 @@ COPY DATABASE <db_name>
207209
-- 将 public 数据库中所有数据导出到 /tmp/export/ 目录下
208210
COPY DATABASE public TO '/tmp/export/' WITH (FORMAT='parquet');
209211

212+
-- 使用 4 个并行操作导出所有表数据
213+
COPY DATABASE public TO '/tmp/export/' WITH (FORMAT='parquet', PARALLELISM=4);
214+
210215
-- 将 public 数据库中时间范围在 2022-04-11 08:00:00 到 2022-04-11 09:00:00 之间的数据导出到 /tmp/export/ 目录下
211216
COPY DATABASE public TO '/tmp/export/' WITH (FORMAT='parquet', START_TIME='2022-04-11 08:00:00', END_TIME='2022-04-11 09:00:00');
212217

213218
-- 从 /tmp/export/ 目录恢复 public 数据库的数据
214219
COPY DATABASE public FROM '/tmp/export/' WITH (FORMAT='parquet');
220+
221+
-- 使用 8 个并行操作导入数据
222+
COPY DATABASE public FROM '/tmp/export/' WITH (FORMAT='parquet', PARALLELISM=8);
215223
```
216224

217225
## Windows 平台上的路径

0 commit comments

Comments
 (0)