GreptimeTeam · evenyag · Nov 17, 2025 · Nov 20, 2025 · Nov 24, 2025
@@ -148,13 +148,14 @@ After dropping the default value, the column will use `NULL` as the default. The
 
 ### Alter table options
 
-`ALTER TABLE` statements can also be used to change the options of tables. 
+`ALTER TABLE` statements can also be used to change the options of tables.
 
 Currently following options are supported:
 - `ttl`: the retention time of data in table.
 - `compaction.twcs.time_window`: the time window parameter of TWCS compaction strategy. The value should be a [time duration string](/reference/time-durations.md).
 - `compaction.twcs.max_output_file_size`: the maximum allowed output file size of TWCS compaction strategy.
 - `compaction.twcs.trigger_file_num`: the number of files in a specific time window to trigger a compaction.
+- `sst_format`: the SST format of the table. The value should be `flat`. A table only supports changing the format from `primary_key` to `flat`.
 
 ```sql
 ALTER TABLE monitor SET 'ttl'='1d';
@@ -164,6 +165,8 @@ ALTER TABLE monitor SET 'compaction.twcs.time_window'='2h';
 ALTER TABLE monitor SET 'compaction.twcs.max_output_file_size'='500MB';
 
 ALTER TABLE monitor SET 'compaction.twcs.trigger_file_num'='8';
+
+ALTER TABLE monitor SET 'sst_format'='flat';
 ```
 
 ### Unset table options

@@ -151,6 +151,7 @@ Users can add table options by using `WITH`. The valid options contain the follo
 | `memtable.type`                             | Type of the memtable.                                           | String value, supports `time_series`, `partition_tree`.                                                                                                                                                                                                     |
 | `append_mode`                               | Whether the table is append-only                                | String value. Default is 'false', which removes duplicate rows by primary keys and timestamps according to the `merge_mode`. Setting it to 'true' to enable append mode and create an append-only table which keeps duplicate rows.                         |
 | `merge_mode`                                | The strategy to merge duplicate rows                            | String value. Only available when `append_mode` is 'false'. Default is `last_row`, which keeps the last row for the same primary key and timestamp. Setting it to `last_non_null` to keep the last non-null field for the same primary key and timestamp.   |
+| `sst_format`                                | The format of SST files                            | String value, supports `primary_key`, `flat`. Default is `primary_key`. `flat` is recommended for tables which have a large number of unique primary keys.   |
 | `comment`                                   | Table level comment                                             | String value.                                                                                                                                                                                                                                               |
 | `skip_wal`                                | Whether to disable Write-Ahead-Log for this table                               | String type. When set to `'true'`, the data written to the table will not be persisted to the write-ahead log, which can avoid storage wear and improve write throughput. However, when the process restarts, any unflushed data will be lost. Please use this feature only when the data source itself can ensure reliability. |
 | `index.type`                                | Index type                                                      | **Only for metric engine** String value, supports `none`, `skipping`.                                                                                                                                                                                       |
@@ -171,15 +172,15 @@ The `ttl` value can be one of the following:
 - `forever`, `NULL`, an empty string `''` and `0s` (or any zero length duration, like `0d`), means the data will never be deleted.
 - `instant`, note that database's TTL can't be set to `instant`. `instant` means the data will be deleted instantly when inserted, useful if you want to send input to a flow task without saving it, see more details in [flow management documents](/user-guide/flow-computation/manage-flow.md#manage-flows).
 - Unset, `ttl` can be unset by using `ALTER TABLE <table-name> UNSET 'ttl'`, which means the table will inherit the database's ttl policy (if any).
-  
+
 If a table has its own TTL policy, it will take precedence over the database TTL policy.
-Otherwise, the database TTL policy will be applied to the table. 
+Otherwise, the database TTL policy will be applied to the table.
 
 So if table's TTL is set to `forever`, no matter what the database's TTL is, the data will never be deleted. But if you unset table TTL using:
 ```sql
 ALTER TABLE <table-name> UNSET 'ttl';
 ```
-Then the database's TTL will be applied to the table. 
+Then the database's TTL will be applied to the table.
 
 Note that the default TTL setting for table and database is unset, which also means the data will never be deleted.
 
@@ -286,10 +287,10 @@ CREATE TABLE greptime_physical_table (
     greptime_timestamp TIMESTAMP(3) NOT NULL,
     greptime_value DOUBLE NULL,
     TIME INDEX (greptime_timestamp),
-) 
+)
 engine = metric
 with (
-    "physical_metric_table" = "",   
+    "physical_metric_table" = "",
 );
 ```
 
@@ -304,14 +305,32 @@ CREATE TABLE greptime_physical_table (
     greptime_timestamp TIMESTAMP(3) NOT NULL,
     greptime_value DOUBLE NULL,
     TIME INDEX (greptime_timestamp),
-) 
+)
 engine = metric
 with (
     "physical_metric_table" = "",
     "index.type" = "skipping",
 );
 ```
 
+#### Create a table with SST format
+
+Create a table with `flat` SST format.
+
+```sql
+create table if not exists metrics(
+    host string,
+    ts timestamp,
+    cpu double,
+    memory double,
+    TIME INDEX (ts),
+    PRIMARY KEY(host)
+)
+with('sst_format'='flat');
+```
+
+The `flat` format is an new format that is optimized for high cardinality primary keys. By default, the SST format of a table is `primary_key` for backward compatibility. The default format will be `flat` once it is stable.
+
 
 
 ### Column options
@@ -480,4 +499,3 @@ For the statement to create or update a view, please read the [view user guide](
 ## CREATE TRIGGER
 
 Please refer to the [CREATE TRIGGER](/reference/sql/trigger-syntax.md#create-trigger) documentation.
-
@@ -77,9 +77,9 @@ The `http_logs` table is an example for storing HTTP server logs.
 - The table sorts logs by time so it is efficient to search logs by time.
 
 
-### When to use primary key
+### Primary key design and SST format
 
-You can use primary key when there are suitable low cardinality columns and one of the following conditions is met:
+You can use primary key when there are suitable columns and one of the following conditions is met:
 
 - Most queries can benefit from the ordering.
 - You need to deduplicate (including delete) rows by the primary key and time index.
@@ -108,18 +108,44 @@ CREATE TABLE http_logs_v2 (
 ) with ('append_mode'='true');
 ```
 
+A long primary key will negatively affect the insert performance and enlarge the memory footprint. It's recommended to define a primary key with no more than 5 columns.
 
-In order to improve sort and deduplication speed under time-series workloads, GreptimeDB buffers and processes rows by time-series internally.
+
+#### Using flat format table for high cardinality primary keys
+
+In order to improve sort and deduplication speed under time-series workloads, GreptimeDB buffers and processes rows by time-series under default SST format.
 So it doesn't need to compare the primary key for each row repeatedly.
 This can be a problem if the tag column has high cardinality:
 
 1. Performance degradation since the database can't batch rows efficiently.
 2. It may increase memory and CPU usage as the database has to maintain the metadata for each time-series.
 3. Deduplication may be too expensive.
 
+Currently, the recommended number of values for the primary key is no more than 100 thousand under the default format.
+
+Sometimes, users may want to put a high cardinality column in the primary key:
+
+* They have to deduplicate rows by that column, although it isn't efficient.
+* Ordering rows by that column can improve query performance significantly.
 
-So you must not use high cardinality column as the primary key or put too many columns in the primary key. Currently, the recommended number of values for the primary key is no more than 100 thousand. A long primary key will negatively affect the insert performance and enlarge the memory footprint. A primary key with no more than 5 columns is recommended.
+To use high cardinality columns as the primary key, you could set the SST format to `flat`.
+This format has much lower memory usage and better performance under this workload.
+Note that deduplication on high cardinality primary keys is always expensive. So it's still recommended to use append-only table if you can tolerate duplication.
 
+```sql
+CREATE TABLE http_logs_flat (
+  access_time TIMESTAMP TIME INDEX,
+  application STRING,
+  remote_addr STRING,
+  http_status STRING,
+  http_method STRING,
+  http_refer STRING,
+  user_agent STRING,
+  request_id STRING,
+  request STRING,
+  PRIMARY KEY(application, request_id),
+) with ('append_mode'='true', 'sst_format'='flat');
+```
 
 Recommendations for tags:
 
@@ -128,9 +154,8 @@ Recommendations for tags:
   For example, `namespace`, `cluster`, or an AWS `region`.
 - No need to set all low cardinality columns as tags since this may impact the performance of ingestion and querying.
 - Typically use short strings and integers for tags, avoiding `FLOAT`, `DOUBLE`, `TIMESTAMP`.
-- Never set high cardinality columns as tags if they change frequently.
-  For example, `trace_id`, `span_id`, `user_id` must not be used as tags.
-  GreptimeDB works well if you set them as fields instead of tags.
+- Set `sst_format` to `flat` if tags change frequently.
+  For example, when tags contain columns like `trace_id`, `span_id`, and `user_id`.
 
 
 ## Index

@@ -59,14 +59,14 @@ staging_size = "10GB"
 
 Some tips:
 
-- 1/10 of disk space for the write cache at least
+- 1/10 of disk space for the write cache at least. It's recommended to use a large write cache when using object storage.
 - 1/4 of total memory for the `page_cache_size` at least if the memory usage is under 20%
 - Double the cache size if the cache hit ratio is less than 50%
 - If using full-text index, leave 1/10 of disk space for the `staging_size` at least
 
-### Avoid adding high cardinality columns to the primary key
+### Using flat format table for high cardinality primary keys
 
-Putting high cardinality columns, such as `trace_id` or `uuid`, into the primary key can negatively impact both write and query performance. Instead, consider using an [append-only table](/reference/sql/create.md#create-an-append-only-table) and setting these high cardinality columns as fields.
+Putting high cardinality columns, such as `trace_id` or `uuid`, into the primary key can negatively impact both write and query performance under the default format. Instead, consider using an [append-only table](/reference/sql/create.md#create-an-append-only-table) and setting the SST format to [`flat` format](/reference/sql/create.md#create-a-table-with-sst-format).
 
 ### Using append-only table if possible
 

@@ -148,13 +148,14 @@ After dropping the default value, the column will use `NULL` as the default. The
 
 ### Alter table options
 
-`ALTER TABLE` statements can also be used to change the options of tables. 
+`ALTER TABLE` statements can also be used to change the options of tables.
 
 Currently following options are supported:
 - `ttl`: the retention time of data in table.
 - `compaction.twcs.time_window`: the time window parameter of TWCS compaction strategy. The value should be a [time duration string](/reference/time-durations.md).
 - `compaction.twcs.max_output_file_size`: the maximum allowed output file size of TWCS compaction strategy.
 - `compaction.twcs.trigger_file_num`: the number of files in a specific time window to trigger a compaction.
+- `sst_format`: the SST format of the table. The value should be `flat`. A table only supports changing the format from `primary_key` to `flat`.
 
 ```sql
 ALTER TABLE monitor SET 'ttl'='1d';
@@ -164,6 +165,8 @@ ALTER TABLE monitor SET 'compaction.twcs.time_window'='2h';
 ALTER TABLE monitor SET 'compaction.twcs.max_output_file_size'='500MB';
 
 ALTER TABLE monitor SET 'compaction.twcs.trigger_file_num'='8';
+
+ALTER TABLE monitor SET 'sst_format'='flat';
 ```
 
 ### Unset table options

@@ -171,15 +171,15 @@ The `ttl` value can be one of the following:
 - `forever`, `NULL`, an empty string `''` and `0s` (or any zero length duration, like `0d`), means the data will never be deleted.
 - `instant`, note that database's TTL can't be set to `instant`. `instant` means the data will be deleted instantly when inserted, useful if you want to send input to a flow task without saving it, see more details in [flow management documents](/user-guide/flow-computation/manage-flow.md#manage-flows).
 - Unset, `ttl` can be unset by using `ALTER TABLE <table-name> UNSET 'ttl'`, which means the table will inherit the database's ttl policy (if any).
-  
+
 If a table has its own TTL policy, it will take precedence over the database TTL policy.
-Otherwise, the database TTL policy will be applied to the table. 
+Otherwise, the database TTL policy will be applied to the table.
 
 So if table's TTL is set to `forever`, no matter what the database's TTL is, the data will never be deleted. But if you unset table TTL using:
 ```sql
 ALTER TABLE <table-name> UNSET 'ttl';
 ```
-Then the database's TTL will be applied to the table. 
+Then the database's TTL will be applied to the table.
 
 Note that the default TTL setting for table and database is unset, which also means the data will never be deleted.
 
@@ -286,10 +286,10 @@ CREATE TABLE greptime_physical_table (
     greptime_timestamp TIMESTAMP(3) NOT NULL,
     greptime_value DOUBLE NULL,
     TIME INDEX (greptime_timestamp),
-) 
+)
 engine = metric
 with (
-    "physical_metric_table" = "",   
+    "physical_metric_table" = "",
 );
 ```
 
@@ -304,14 +304,32 @@ CREATE TABLE greptime_physical_table (
     greptime_timestamp TIMESTAMP(3) NOT NULL,
     greptime_value DOUBLE NULL,
     TIME INDEX (greptime_timestamp),
-) 
+)
 engine = metric
 with (
     "physical_metric_table" = "",
     "index.type" = "skipping",
 );
 ```
 
+#### Create a table with SST format
+
+Create a table with `flat` SST format.
+
+```sql
+create table if not exists metrics(
+    host string,
+    ts timestamp,
+    cpu double,
+    memory double,
+    TIME INDEX (ts),
+    PRIMARY KEY(host)
+)
+with('sst_format'='flat');
+```
+
+The `flat` format is an new format that is optimized for high cardinality primary keys. By default, the SST format of a table is `primary_key` for backward compatibility. The default format will be `flat` once it is stable.
+
 
 
 ### Column options
@@ -480,4 +498,3 @@ For the statement to create or update a view, please read the [view user guide](
 ## CREATE TRIGGER
 
 Please refer to the [CREATE TRIGGER](/reference/sql/trigger-syntax.md#create-trigger) documentation.
-
@@ -77,9 +77,9 @@ The `http_logs` table is an example for storing HTTP server logs.
 - The table sorts logs by time so it is efficient to search logs by time.
 
 
-### When to use primary key
+### Primary key design and SST format
 
-You can use primary key when there are suitable low cardinality columns and one of the following conditions is met:
+You can use primary key when there are suitable columns and one of the following conditions is met:
 
 - Most queries can benefit from the ordering.
 - You need to deduplicate (including delete) rows by the primary key and time index.
@@ -108,17 +108,44 @@ CREATE TABLE http_logs_v2 (
 ) with ('append_mode'='true');
 ```
 
+A long primary key will negatively affect the insert performance and enlarge the memory footprint. It's recommended to define a primary key with no more than 5 columns.
 
-In order to improve sort and deduplication speed under time-series workloads, GreptimeDB buffers and processes rows by time-series internally.
+
+#### Using flat format table for high cardinality primary keys
+
+In order to improve sort and deduplication speed under time-series workloads, GreptimeDB buffers and processes rows by time-series under default SST format.
 So it doesn't need to compare the primary key for each row repeatedly.
 This can be a problem if the tag column has high cardinality:
 
 1. Performance degradation since the database can't batch rows efficiently.
 2. It may increase memory and CPU usage as the database has to maintain the metadata for each time-series.
 3. Deduplication may be too expensive.
 
+Currently, the recommended number of values for the primary key is no more than 100 thousand under the default format.
+
+Sometimes, users may want to put a high cardinality column in the primary key:
+
+* They have to deduplicate rows by that column, although it isn't efficient.
+* Ordering rows by that column can improve query performance significantly.
 
-So you must not use high cardinality column as the primary key or put too many columns in the primary key. Currently, the recommended number of values for the primary key is no more than 100 thousand. A long primary key will negatively affect the insert performance and enlarge the memory footprint. A primary key with no more than 5 columns is recommended.
+To use high cardinality columns as the primary key, you could set the SST format to `flat`.
+This format has much lower memory usage and better performance under this workload.
+Note that deduplication on high cardinality primary keys is always expensive. So it's still recommended to use append-only table if you can tolerate duplication.
+
+```sql
+CREATE TABLE http_logs_flat (
+  access_time TIMESTAMP TIME INDEX,
+  application STRING,
+  remote_addr STRING,
+  http_status STRING,
+  http_method STRING,
+  http_refer STRING,
+  user_agent STRING,
+  request_id STRING,
+  request STRING,
+  PRIMARY KEY(application, request_id),
+) with ('append_mode'='true', 'sst_format'='flat');
+```
 
 
 Recommendations for tags:
@@ -128,9 +155,8 @@ Recommendations for tags:
   For example, `namespace`, `cluster`, or an AWS `region`.
 - No need to set all low cardinality columns as tags since this may impact the performance of ingestion and querying.
 - Typically use short strings and integers for tags, avoiding `FLOAT`, `DOUBLE`, `TIMESTAMP`.
-- Never set high cardinality columns as tags if they change frequently.
-  For example, `trace_id`, `span_id`, `user_id` must not be used as tags.
-  GreptimeDB works well if you set them as fields instead of tags.
+- Set `sst_format` to `flat` if tags change frequently.
+  For example, when tags contain columns like `trace_id`, `span_id`, and `user_id`.
 
 
 ## Index