Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docs/lakehouse/catalog-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,15 @@ SELECT k1, k3 FROM table; -- Error: Unsupported type 'UNSUPPORTED_TYPE
SELECT k1, k4 FROM table; -- Query OK.
```

### Nullable Attribute

Doris currently has special restrictions on the Nullable attribute support for external table columns, with specific behaviors as follows:

| Source Type | Doris Read Behavior | Doris Write Behavior |
| --- | ------------ | ------------ |
| Nullable | Nullable | Allow writing Null values |
| Not Null | Nullable, i.e., still treated as columns that allow NULL during reading | Allow writing Null values, i.e., no strict checking for Null values. Users need to ensure data integrity and consistency themselves.|

## Using Catalog

### Viewing Catalog
Expand Down
94 changes: 54 additions & 40 deletions docs/lakehouse/catalogs/iceberg-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,13 @@ INSERT INTO iceberg_tbl(col1, col2) VALUES (val1, val2);
INSERT INTO iceberg_tbl(col1, col2, partition_col1, partition_col2) VALUES (1, 2, 'beijing', '2023-12-12');
```

Since version 3.1.0, support for writing data to specified branches:

```sql
INSERT INTO iceberg_tbl@branch(b1) values (val1, val2, val3, val4);
INSERT INTO iceberg_tbl@branch(b1) (col3, col4) values (val3, val4);
```

### INSERT OVERWRITE

The INSERT OVERWRITE operation completely replaces the existing data in the table with new data.
Expand All @@ -562,6 +569,13 @@ INSERT OVERWRITE TABLE iceberg_tbl VALUES (val1, val2, val3, val4);
INSERT OVERWRITE TABLE iceberg.iceberg_db.iceberg_tbl(col1, col2) SELECT col1, col2 FROM internal.db1.tbl1;
```

Since version 3.1.0, support for writing data to specified branches:

```sql
INSERT OVERWRITE TABLE iceberg_tbl@branch(b1) values (val1, val2, val3, val4);
INSERT OVERWRITE TABLE iceberg_tbl@branch(b1) (col3, col4) values (val3, val4);
```

### CTAS

You can create an Iceberg table and write data using the `CTAS` (Create Table As Select) statement:
Expand Down Expand Up @@ -755,74 +769,74 @@ Supported schema change operations include:

* **Rename Column**

Use the `RENAME COLUMN` clause to rename columns. Renaming columns within nested types is not supported.
Use the `RENAME COLUMN` clause to rename columns. Renaming columns within nested types is not supported.

```sql
ALTER TABLE iceberg_table RENAME COLUMN old_col_name TO new_col_name;
```
```sql
ALTER TABLE iceberg_table RENAME COLUMN old_col_name TO new_col_name;
```

* **Add a Column**

Use `ADD COLUMN` to add a new column. The new column will be added to the end of the table. Adding new columns to nested types is not supported.
Use `ADD COLUMN` to add a new column. Adding new columns to nested types is not supported.

When adding a new column, you can specify nullable attributes, default values, and comments.
When adding a new column, you can specify nullable attributes, default values, comments, and column position.

```sql
ALTER TABLE iceberg_table ADD COLUMN col_name col_type [nullable, [default default_value, [comment 'comment']]];
```
```sql
ALTER TABLE iceberg_table ADD COLUMN col_name col_type [NULL|NOT NULL, [DEFAULT default_value, [COMMENT 'comment', [FIRST|AFTER col_name]]]];
```

Example:
Example:

```sql
ALTER TABLE iceberg_table ADD COLUMN new_col STRING NOT NULL DEFAULT 'default_value' COMMENT 'This is a new col';
```
```sql
ALTER TABLE iceberg_table ADD COLUMN new_col STRING NOT NULL DEFAULT 'default_value' COMMENT 'This is a new col' AFTER old_col;
```

* **Add Columns**

You can also use `ADD COLUMN` to add multiple columns. The new columns will be added to the end of the table. Adding new columns to nested types is not supported.
You can also use `ADD COLUMN` to add multiple columns. The new columns will be added to the end of the table. Column positioning is not supported for multiple columns. Adding new columns to nested types is not supported.

The syntax for each column is the same as adding a single column.
The syntax for each column is the same as adding a single column.

```sql
ALTER TABLE iceberg_table ADD COLUMN (col_name1 col_type1 [nullable, [default default_value, [comment 'comment']]], col_name2 col_type2 [nullable, [default default_value, [comment 'comment']]] ...);
```
```sql
ALTER TABLE iceberg_table ADD COLUMN (col_name1 col_type1 [NULL|NOT NULL, [DEFAULT default_value, [COMMENT 'comment']]], col_name2 col_type2 [NULL|NOT NULL, [DEFAULT default_value, [COMMENT 'comment']]] ...);
```

* **Drop Column**

Use `DROP COLUMN` to drop columns. Dropping columns within nested types is not supported.
Use `DROP COLUMN` to drop columns. Dropping columns within nested types is not supported.

```sql
ALTER TABLE iceberg_table DROP COLUMN col_name;
```
```sql
ALTER TABLE iceberg_table DROP COLUMN col_name;
```

* **Modify Column**

Use the `MODIFY COLUMN` statement to modify column attributes, including type, nullable, default value, and comment.
Use the `MODIFY COLUMN` statement to modify column attributes, including type, nullable, default value, comment, and column position.

Note: When modifying column attributes, all attributes that are not being modified should also be explicitly specified with their original values.
Note: When modifying column attributes, all attributes that are not being modified should also be explicitly specified with their original values.

```sql
ALTER TABLE iceberg_table MODIFY COLUMN col_name col_type [nullable, [default default_value, [comment 'comment']]];
```
```sql
ALTER TABLE iceberg_table MODIFY COLUMN col_name col_type [NULL|NOT NULL, [DEFAULT default_value, [COMMENT 'comment', [FIRST|AFTER col_name]]]];
```

Example:
Example:

```sql
CREATE TABLE iceberg_table (
id INT,
name STRING
);
-- Modify the id column type to BIGINT, set as NOT NULL, default value to 0, and add comment
ALTER TABLE iceberg_table MODIFY COLUMN id BIGINT NOT NULL DEFAULT 0 COMMENT 'This is a modified id column';
```
```sql
CREATE TABLE iceberg_table (
id INT,
name STRING
);
-- Modify the id column type to BIGINT, set as NOT NULL, default value to 0, and add comment
ALTER TABLE iceberg_table MODIFY COLUMN id BIGINT NOT NULL DEFAULT 0 COMMENT 'This is a modified id column';
```

* **Reorder Columns**

Use `ORDER BY` to reorder columns by specifying the new column order.
Use `ORDER BY` to reorder columns by specifying the new column order.

```sql
ALTER TABLE iceberg_table ORDER BY (col_name1, col_name2, ...);
```
```sql
ALTER TABLE iceberg_table ORDER BY (col_name1, col_name2, ...);
```

## Iceberg Table Optimization

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,15 @@ SELECT k1, k3 FROM table; -- Error: Unsupported type 'UNSUPPORTED_TYPE
SELECT k1, k4 FROM table; -- Query OK.
```

### Nullable 属性

Doris 目前对外表列的 Nullable 属性支持有特殊限制,具体行为如下:

| 源类型 | Doris 读取行为 | Doris 写入行为 |
| --- | ------------ | ------------ |
| Nullable | Nullable | 允许写入 Null 值 |
| Not Null | Nullable,即依然当做可允许为 NULL 的列进行读取 | 允许写入 Null 值,即不对 Null 值进行严格检查。用户需要自行保证数据的完整性和一致性。|

## 使用数据目录

### 查看数据目录
Expand Down Expand Up @@ -142,9 +151,9 @@ jdbc:mysql://host:9030/iceberg_catalog.iceberg_db
SET PROPERTY default_init_catalog=hive_catalog;
```

注意1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效;
注意2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录;
注意3:该功能从 v3.1.x 版本开始生效;
注意 1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效;
注意 2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录;
注意 3:该功能从 v3.1.x 版本开始生效;

### 简单查询

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -562,15 +562,29 @@ INSERT INTO iceberg_tbl(col1, col2) values (val1, val2);
INSERT INTO iceberg_tbl(col1, col2, partition_col1, partition_col2) values (1, 2, "beijing", "2023-12-12");
```

自 3.1.0 版本,支持写入数据到指定分支:

```sql
INSERT INTO iceberg_tbl@branch(b1) values (val1, val2, val3, val4);
INSERT INTO iceberg_tbl@branch(b1) (col3, col4) values (val3, val4);
```

### INSERT OVERWRITE

INSERT OVERWRITE 会使用新的数据完全覆盖原有表中的数据。

```sql
INSERT OVERWRITE TABLE VALUES(val1, val2, val3, val4)
INSERT OVERWRITE TABLE iceberg_tbl VALUES(val1, val2, val3, val4)
INSERT OVERWRITE TABLE iceberg.iceberg_db.iceberg_tbl(col1, col2) SELECT col1, col2 FROM internal.db1.tbl1;
```

自 3.1.0 版本,支持写入数据到指定分支:

```sql
INSERT OVERWRITE TABLE iceberg_tbl@branch(b1) values (val1, val2, val3, val4);
INSERT OVERWRITE TABLE iceberg_tbl@branch(b1) (col3, col4) values (val3, val4);
```

### CTAS

可以通过 `CTAS` 语句创建 Iceberg 表并写入数据:
Expand Down Expand Up @@ -764,73 +778,74 @@ DROP DATABASE [IF EXISTS] iceberg.iceberg_db;

* **修改列名称**

通过 `RENAME COLUMN` 子句修改列名称,不支持修改嵌套类型中的列名称。
通过 `RENAME COLUMN` 子句修改列名称,不支持修改嵌套类型中的列名称。

```sql
ALTER TABLE iceberg_table RENAME COLUMN old_col_name TO new_col_name;
```
```sql
ALTER TABLE iceberg_table RENAME COLUMN old_col_name TO new_col_name;
```

* **添加一列**

通过 `ADD COLUMN` 添加新列,新列会被添加到表的末尾,不支持为嵌套类型添加新列。
通过 `ADD COLUMN` 添加新列,不支持为嵌套类型添加新列。

在添加新列时,可以指定 nullable 属性、默认值和注释
在添加新列时,可以指定 nullable 属性、默认值、注释和列位置

```sql
ALTER TABLE iceberg_table ADD COLUMN col_name col_type [nullable, [default default_value, [comment 'comment']]];
```
```sql
ALTER TABLE iceberg_table ADD COLUMN col_name col_type [NULL|NOT NULL, [DEFAULT default_value, [COMMENT 'comment', [FIRST|AFTER col_name]]]];
```

示例:
示例:

```sql
ALTER TABLE iceberg_table ADD COLUMN new_col STRING NOT NULL DEFAULT 'default_value' COMMENT 'This is a new col';
```
```sql
ALTER TABLE iceberg_table ADD COLUMN new_col STRING NOT NULL DEFAULT 'default_value' COMMENT 'This is a new col' AFTER old_col;
```

* **添加多列**

可以通过 `ADD COLUMN` 添加多列,新列会被添加到表的末尾,不支持为嵌套类型添加新列。
可以通过 `ADD COLUMN` 添加多列,新列会被添加到表的末尾,不支持指定列位置,不支持为嵌套类型添加新列。

每一列的语法和添加单列时一样。
每一列的语法和添加单列时一样。

```sql
ALTER TABLE iceberg_table ADD COLUMN (col_name1 col_type1 [NULL|NOT NULL, [DEFAULT default_value, [COMMENT 'comment']]], col_name2 col_type2 [NULL|NOT NULL, [DEFAULT default_value, [COMMENT 'comment']]] ...);
```

```sql
ALTER TABLE iceberg_table ADD COLUMN (col_name1 col_type1 [nullable, [default default_value, [comment 'comment']]], col_name2 col_type2 [nullable, [default default_value, [comment 'comment']]] ...);
```
* **删除列**

通过 `DROP COLUMN` 删除列,不支持删除嵌套类型中的列。
通过 `DROP COLUMN` 删除列,不支持删除嵌套类型中的列。

```sql
ALTER TABLE iceberg_table DROP COLUMN col_name;
```
```sql
ALTER TABLE iceberg_table DROP COLUMN col_name;
```

* **修改列**

通过 `MODIFY COLUMN` 语句修改列的属性,包括类型,nullable,默认值和注释
通过 `MODIFY COLUMN` 语句修改列的属性,包括类型,nullable,默认值、注释和列位置

注意:修改列的属性时,所有没有被修改的属性也应该显式地指定为原来的值。
注意:修改列的属性时,所有没有被修改的属性也应该显式地指定为原来的值。

```sql
ALTER TABLE iceberg_table MODIFY COLUMN col_name col_type [nullable, [default default_value, [comment 'comment']]];
```
```sql
ALTER TABLE iceberg_table MODIFY COLUMN col_name col_type [NULL|NOT NULL, [DEFAULT default_value, [COMMENT 'comment', [FIRST|AFTER col_name]]]];
```

示例:
示例:

```sql
CREATE TABLE iceberg_table (
id INT,
name STRING
);
-- 修改 id 列的类型为 BIGINT,设置为 NOT NULL,默认值为 0,并添加注释
ALTER TABLE iceberg_table MODIFY COLUMN id BIGINT NOT NULL DEFAULT 0 COMMENT 'This is a modified id column';
```
```sql
CREATE TABLE iceberg_table (
id INT,
name STRING
);
-- 修改 id 列的类型为 BIGINT,设置为 NOT NULL,默认值为 0,并添加注释
ALTER TABLE iceberg_table MODIFY COLUMN id BIGINT NOT NULL DEFAULT 0 COMMENT 'This is a modified id column' FIRST;
```

* **重新排序**

通过 `ORDER BY` 重新排序列,指定新的列顺序。
通过 `ORDER BY` 重新排序列,指定新的列顺序。

```sql
ALTER TABLE iceberg_table ORDER BY (col_name1, col_name2, ...);
```
```sql
ALTER TABLE iceberg_table ORDER BY (col_name1, col_name2, ...);
```

## Iceberg 表优化

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,15 @@ SELECT k1, k3 FROM table; -- Error: Unsupported type 'UNSUPPORTED_TYPE
SELECT k1, k4 FROM table; -- Query OK.
```

### Nullable 属性

Doris 目前对外表列的 Nullable 属性支持有特殊限制,具体行为如下:

| 源类型 | Doris 读取行为 | Doris 写入行为 |
| --- | ------------ | ------------ |
| Nullable | Nullable | 允许写入 Null 值 |
| Not Null | Nullable,即依然当做可允许为 NULL 的列进行读取 | 允许写入 Null 值,即不对 Null 值进行严格检查。用户需要自行保证数据的完整性和一致性。|

## 使用数据目录

### 查看数据目录
Expand Down Expand Up @@ -142,9 +151,9 @@ jdbc:mysql://host:9030/iceberg_catalog.iceberg_db
SET PROPERTY default_init_catalog=hive_catalog;
```

注意1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效;
注意2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录;
注意3:该功能从 v3.1.x 版本开始生效;
注意 1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效;
注意 2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录;
注意 3:该功能从 v3.1.x 版本开始生效;

### 简单查询

Expand Down
Loading