diff --git a/docs/data-operate/import/complex-types/array.md b/docs/data-operate/import/complex-types/array.md index 4992c9dc09fe7..fe51e5028456f 100644 --- a/docs/data-operate/import/complex-types/array.md +++ b/docs/data-operate/import/complex-types/array.md @@ -5,17 +5,7 @@ } --- -`ARRAY` An array of T-type items, it cannot be used as a key column. - -- Before version 2.0, it was only supported in the Duplicate model table. -- Starting from version 2.0, it is supported in the non-key columns of the Unique model table. - -T-type could be any of: - -```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, -DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +`ARRAY` An array of T-type items. Click [ARRAY](../../../sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md) to learn more. ## CSV format import diff --git a/docs/data-operate/import/complex-types/map.md b/docs/data-operate/import/complex-types/map.md index a5ff7f210641b..6d38a4ecf084e 100644 --- a/docs/data-operate/import/complex-types/map.md +++ b/docs/data-operate/import/complex-types/map.md @@ -5,14 +5,7 @@ } --- -`MAP` A Map of K, V items, it cannot be used as a key column. Now MAP can only be used in Duplicate and Unique Model Tables. - -K,V could be any of: - -```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, -DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +`MAP` A Map of K, V items。 Click [MAP](../../../sql-manual/basic-element/sql-data-types/semi-structured/MAP.md) to learn more. ## CSV format import diff --git a/docs/data-operate/import/complex-types/struct.md b/docs/data-operate/import/complex-types/struct.md index f146a9bb98d79..304089e66f37b 100644 --- a/docs/data-operate/import/complex-types/struct.md +++ b/docs/data-operate/import/complex-types/struct.md @@ -5,22 +5,7 @@ } --- -`STRUCT` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns. - -- It cannot be used as a Key column. Now STRUCT can only be used in Duplicate Model Tables. - -- The names and number of Fields in a Struct are fixed and always Nullable, and a Field typically consists of the following parts. - - - field_name: Identifier naming the field, non repeatable. - - field_type: A data type. - - COMMENT: An optional string describing the field. (currently not supported) - -The currently supported types are: - -```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, -DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +`STRUCT` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns.Click [STRUCT](../../../sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md) to learn more. ## CSV format import diff --git a/docs/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md b/docs/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md index a1526e1026f9e..2c7d7714c73aa 100644 --- a/docs/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md +++ b/docs/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md @@ -5,65 +5,267 @@ } --- -## ARRAY +# ARRAY Documentation -ARRAY +## Type Description -### description +The `ARRAY` type is used to represent an ordered collection of elements, where each element has the same data type. For example, an array of integers can be represented as `[1, 2, 3]`, and an array of strings as `["a", "b", "c"]`. -`ARRAY` +- `ARRAY` represents an array composed of elements of type T, where T is nullable. Supported types for T include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6, STRUCT, MAP, VARIANT, JSONB, ARRAY`. + - Note: Among the above T types, `JSONB` and `VARIANT` are only supported in the computation layer of Doris and **do not support using `ARRAY` and `ARRAY` in table creation in Doris**. -An array of T-type items, it cannot be used as a key column. Now ARRAY can only used in Duplicate Model Tables. +## Type Constraints -After version 2.0, it supports the use of non-key columns in Unique model tables. +- The maximum nesting depth supported by `ARRAY` type is 9. +- Conversion between `ARRAY` types depends on whether T can be converted. `Array` type cannot be converted to other types. + - For example: `ARRAY` can be converted to `ARRAY` because `INT` and `BIGINT` can be converted. + - `Variant` type can be converted to `Array` type. + - String type can be converted to `ARRAY` type (through parsing, returning NULL if parsing fails). +- In the `AGGREGATE` table model, `ARRAY` type only supports `REPLACE` and `REPLACE_IF_NOT_NULL`. **In any table model, it cannot be used as a KEY column, nor as a partition or bucket column**. +- Columns of `ARRAY` type **support `ORDER BY` and `GROUP BY` operations**. + - T types that support `ORDER BY` and `GROUP BY` include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`. +- Columns of `ARRAY` type do not support being used as `JOIN KEY` and do not support being used in `DELETE` statements. -T-type could be any of: +## Constant Construction -``` -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, -DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +- Use the `ARRAY()` function to construct a value of type `ARRAY`, where T is the common type of the parameters. + + ```SQL + -- [1, 2, 3] T is INT + SELECT ARRAY(1, 2, 3); -### example + -- ["1", "2", "abc"] , T is STRING + SELECT ARRAY(1, 2, 'abc'); + ``` +- Use `[]` to construct a value of type `ARRAY`, where T is the common type of the parameters. + + ```SQL + -- ["abc", "def", "efg"] T is STRING + SELECT ["abc", "def", "efg"]; -Create table example: + -- ["1", "2", "abc"] , T is STRING + SELECT [1, 2, 'abc']; + ``` -``` -mysql> CREATE TABLE `array_test` ( - `id` int(11) NULL COMMENT "", - `c_array` ARRAY NULL COMMENT "" -) ENGINE=OLAP -DUPLICATE KEY(`id`) -COMMENT "OLAP" -DISTRIBUTED BY HASH(`id`) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1", -"in_memory" = "false", -"storage_format" = "V2" -); -``` +## Modifying Type -Insert data example: +- Modification is only allowed when the element type inside `ARRAY` is `VARCHAR`. + - Only allows changing the parameter of `VARCHAR` from smaller to larger, not the other way around. -``` -mysql> INSERT INTO `array_test` VALUES (1, [1,2,3,4,5]); -mysql> INSERT INTO `array_test` VALUES (2, [6,7,8]), (3, []), (4, null); -``` + ```SQL + CREATE TABLE `array_table` ( + `k` INT NOT NULL, + `array_column` ARRAY + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); -Select data example: + ALTER TABLE array_table MODIFY COLUMN array_column ARRAY; + ``` +- The default value for columns of type `ARRAY` can only be specified as NULL, and once specified, it cannot be modified. -``` -mysql> SELECT * FROM `array_test`; -+------+-----------------+ -| id | c_array | -+------+-----------------+ -| 1 | [1, 2, 3, 4, 5] | -| 2 | [6, 7, 8] | -| 3 | [] | -| 4 | NULL | -+------+-----------------+ -``` +## Element Access -### keywords +- Use `[k]` to access the k-th element of `ARRAY`, where k starts from 1. If out of bounds, returns NULL. - ARRAY + ```SQL + SELECT [1, 2, 3][1]; + +--------------+ + | [1, 2, 3][1] | + +--------------+ + | 1 | + +--------------+ + + SELECT ARRAY(1, 2, 3)[2]; + +-------------------+ + | ARRAY(1, 2, 3)[2] | + +-------------------+ + | 2 | + +-------------------+ + + SELECT [[1,2,3],[2,3,4]][1][3]; + +-------------------------+ + | [[1,2,3],[2,3,4]][1][3] | + +-------------------------+ + | 3 | + +-------------------------+ + ``` + +- Use `ELEMENT_AT(ARRAY, k)` to access the k-th element of `ARRAY`, where k starts from 1. If out of bounds, returns NULL. + + ```SQL + SELECT ELEMENT_AT(ARRAY(1, 2, 3) , 2); + +--------------------------------+ + | ELEMENT_AT(ARRAY(1, 2, 3) , 2) | + +--------------------------------+ + | 2 | + +--------------------------------+ + + SELECT ELEMENT_AT([1, 2, 3] , 3); + +---------------------------+ + | ELEMENT_AT([1, 2, 3] , 3) | + +---------------------------+ + | 3 | + +---------------------------+ + + SELECT ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3); + +-------------------------------------------------------+ + | ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3) | + +-------------------------------------------------------+ + | ["3"] | + +-------------------------------------------------------+ + ``` + +## Query Acceleration + +- Columns of type `ARRAY` in Doris tables support adding inverted indexes to accelerate computations involving `ARRAY` functions on this column. + - T types supported by inverted indexes: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`. + - Accelerated `ARRAY` functions: `ARRAY_CONTAINS`, `ARRAYS_OVERLAP`, but when the function parameters include NULL, it falls back to regular vectorized computation. + +## Examples + +- Multidimensional Arrays + + ```SQL + -- Create table + CREATE TABLE IF NOT EXISTS array_table ( + id INT, + two_dim_array ARRAY>, + three_dim_array ARRAY>> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- Insert + INSERT INTO array_table VALUES (1, [[1, 2, 3], [4, 5, 6]], [[['ab', 'cd', 'ef'], ['gh', 'ij', 'kl']], [['mn', 'op', 'qr'], ['st', 'uv', 'wx']]]); + + INSERT INTO array_table VALUES (2, ARRAY(ARRAY(1, 2, 3), ARRAY(4, 5, 6)), ARRAY(ARRAY(ARRAY('ab', 'cd', 'ef'), ARRAY('gh', 'ij', 'kl')), ARRAY(ARRAY('mn', 'op', 'qr'), ARRAY('st', 'uv', 'wx')))); + + -- Query + SELECT two_dim_array[1][2], three_dim_array[1][1][2] FROM array_table ORDER BY id; + +---------------------+--------------------------+ + | two_dim_array[1][2] | three_dim_array[1][1][2] | + +---------------------+--------------------------+ + | 2 | cd | + | 2 | cd | + +---------------------+--------------------------+ + ``` + +- Nested Complex Types + + ```SQL + -- Create table + CREATE TABLE IF NOT EXISTS array_map_table ( + id INT, + array_map ARRAY> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- Insert + INSERT INTO array_map_table VALUES (1, ARRAY(MAP('key1', 1), MAP('key2', 2))); + INSERT INTO array_map_table VALUES (2, ARRAY(MAP('key1', 1), MAP('key2', 2))); + + -- Query + SELECT array_map[1], array_map[2] FROM array_map_table ORDER BY id; + +--------------+--------------+ + | array_map[1] | array_map[2] | + +--------------+--------------+ + | {"key1":1} | {"key2":2} | + | {"key1":1} | {"key2":2} | + +--------------+--------------+ + + -- Create table + CREATE TABLE IF NOT EXISTS array_table ( + id INT, + array_struct ARRAY>, + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + INSERT INTO array_table VALUES (1, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane'))); + INSERT INTO array_table VALUES (2, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane'))); + + SELECT array_struct[1], array_struct[2] FROM array_table ORDER BY id; + +-------------------------+-------------------------+ + | array_struct[1] | array_struct[2] | + +-------------------------+-------------------------+ + | {"id":1, "name":"John"} | {"id":2, "name":"Jane"} | + | {"id":1, "name":"John"} | {"id":2, "name":"Jane"} | + +-------------------------+-------------------------+ + ``` + +- Modifying Type + + ```SQL + -- Create table + CREATE TABLE array_table ( + id INT, + array_varchar ARRAY + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- Modify ARRAY type + ALTER TABLE array_table MODIFY COLUMN array_varchar ARRAY; + + -- Check column type + DESC array_table; + +---------------+--------------------+------+-------+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +---------------+--------------------+------+-------+---------+-------+ + | id | int | Yes | true | NULL | | + | array_varchar | array | Yes | false | NULL | NONE | + +---------------+--------------------+------+-------+---------+-------+ + ``` + +- Inverted Index + + ```SQL + -- Create table statement + CREATE TABLE `array_table` ( + `k` int NOT NULL, + `array_column` ARRAY, + INDEX idx_array_column (array_column) USING INVERTED + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- Insert + INSERT INTO array_table VALUES (1, [1, 2, 3]), (2, [4, 5, 6]), (3, [7, 8, 9]); + + -- The inverted index accelerates the execution of the ARRAY_CONTAINS function + SELECT * FROM array_table WHERE ARRAY_CONTAINS(array_column, 5); + +------+--------------+ + | k | array_column | + +------+--------------+ + | 2 | [4, 5, 6] | + +------+--------------+ + + -- The inverted index accelerates the execution of the ARRAYS_OVERLAP function + SELECT * FROM array_table WHERE ARRAYS_OVERLAP(array_column, [6, 9]); + +------+--------------+ + | k | array_column | + +------+--------------+ + | 2 | [4, 5, 6] | + | 3 | [7, 8, 9] | + +------+--------------+ + ``` diff --git a/docs/sql-manual/basic-element/sql-data-types/semi-structured/MAP.md b/docs/sql-manual/basic-element/sql-data-types/semi-structured/MAP.md index 46347c88cef72..4a49253b8ce2f 100644 --- a/docs/sql-manual/basic-element/sql-data-types/semi-structured/MAP.md +++ b/docs/sql-manual/basic-element/sql-data-types/semi-structured/MAP.md @@ -5,260 +5,215 @@ } --- -## MAP +## Type Description -### Name +- The `MAP` type is used to represent a composite type of key-value pairs, where each key uniquely corresponds to a value. + - `key_type` represents the type of the keys, supporting types such as `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`. Keys are nullable and cannot be specified as NOT NULL. + - `value_type` represents the type of the values, supporting `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6, ARRAY, MAP, STRUCT`. Values are nullable and cannot be specified as NOT NULL. -MAP +## Type Constraints -### Syntax -`MAP` +- The `MAP` type allows a maximum nesting depth of 9. +- In `MAP`, keys can be NULL, and identical keys are allowed (NULL and NULL are considered the same key). +- Conversion between `MAP` types depends on whether `key_type` and `value_type` can be converted. `MAP` cannot be converted to other types. + - For example: `MAP` can be converted to `MAP` because `INT` and `BIGINT` can be converted. + - String types can be converted to `MAP` (through parsing, returning NULL if parsing fails). +- In the `AGGREGATE` table model, `MAP` only supports `REPLACE` and `REPLACE_IF_NOT_NULL`. **In any table model, it cannot be used as a KEY column, nor as a partition or bucket column**. +- Columns of type `MAP` do not support comparison or arithmetic operations, **do not support `ORDER BY` and `GROUP BY` operations, cannot be used as `JOIN KEY`, and cannot be used in `DELETE` statements**. +- Columns of type `MAP` do not support creating any indexes. -Where: +## Type Construction -* `K` is the type of the key for the map. You should must use one of the following types for keys: - * String Data Type(Char/Varchar/String) - * Numeric Data Type(except double and float) - * Date Type(Date/Datetime/Time) - * IP Address Type(IPV4/IPV6) +- The `MAP()` function can return a value of type `MAP`. + + ```SQL + SELECT MAP('Alice', 21, 'Bob', 23); + + +-----------------------------+ + | map('Alice', 21, 'Bob', 23) | + +-----------------------------+ + | {"Alice":21, "Bob":23} | + +-----------------------------+ + ``` + +- `{}` can be used to construct a value of type `MAP`. + + ```SQL + SELECT {'Alice': 20}; + + +---------------+ + | {'Alice': 20} | + +---------------+ + | {"Alice":20} | + +---------------+ + ``` + +## Modifying Type + +- Modification is allowed only when `key_type` or `value_type` of `MAP` is `VARCHAR`. + - Only allows changing the parameter of `VARCHAR` from smaller to larger, not the other way around. + + ```SQL + CREATE TABLE `map_table` ( + `k` INT NOT NULL, + `map_varchar_int` MAP, + `map_int_varchar` MAP, + `map_varchar_varchar` MAP + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + ALTER TABLE map_table MODIFY COLUMN map_varchar_int MAP; + + ALTER TABLE map_table MODIFY COLUMN map_int_varchar MAP; + + ALTER TABLE map_table MODIFY COLUMN map_varchar_varchar MAP; + ``` + +- The default value for columns of type `MAP` can only be specified as NULL, and once specified, it cannot be modified. + +## Element Access + +- Use `[key]` to access the value corresponding to the key in the `MAP`. + + ```SQL + SELECT {'Alice': 20}['Alice']; + + +------------------------+ + | {'Alice': 20}['Alice'] | + +------------------------+ + | 20 | + +------------------------+ + ``` + +- Use `ELEMENT_AT(MAP, Key)` to access the value corresponding to the key in the `MAP`. + + ```SQL + SELECT ELEMENT_AT({'Alice': 20}, 'Alice'); + + +------------------------------------+ + | ELEMENT_AT({'Alice': 20}, 'Alice') | + +------------------------------------+ + | 20 | + +------------------------------------+ + ``` + +## Examples + +- Nested MAPs + + ```SQL + -- Create table + CREATE TABLE IF NOT EXISTS map_table ( + id INT, + map_nested MAP> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- Insert + INSERT INTO map_table VALUES (1, MAP('key1', MAP('key2', 1, 'key3', 2))); + INSERT INTO map_table VALUES (2, MAP('key1', MAP('key2', 3, 'key3', 4))); + + -- Query + SELECT map_nested['key1']['key2'] FROM map_table ORDER BY id; + +----------------------------+ + | map_nested['key1']['key2'] | + +----------------------------+ + | 1 | + | 3 | + +----------------------------+ + ``` + +- Nested Complex Types + + ```SQL + -- Create table + CREATE TABLE IF NOT EXISTS map_table ( + id INT, + map_array MAP> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- Insert + INSERT INTO map_table VALUES (1, MAP('key1', [1, 2, 3])), (2, MAP('key1', [4, 5, 6])); + + -- Query + SELECT map_array['key1'][1] FROM map_table ORDER BY id; + +----------------------+ + | map_array['key1'][1] | + +----------------------+ + | 1 | + | 4 | + +----------------------+ + + -- Create table + CREATE TABLE IF NOT EXISTS map_table ( + id INT, + map_struct MAP> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- Insert + INSERT INTO map_table VALUES (1, MAP('key1', STRUCT(1, 'John'), 'key2', STRUCT(3, 'Jane'))); + + -- Query + SELECT STRUCT_ELEMENT(map_struct['key1'], 1), STRUCT_ELEMENT(map_struct['key1'], 'name') FROM map_table ORDER BY id; + +---------------------------------------+--------------------------------------------+ + | STRUCT_ELEMENT(map_struct['key1'], 1) | STRUCT_ELEMENT(map_struct['key1'], 'name') | + +---------------------------------------+--------------------------------------------+ + | 1 | John | + +---------------------------------------+--------------------------------------------+ + ``` + +- Modifying Type + + ```SQL + -- Create table + CREATE TABLE `map_table` ( + `k` INT NOT NULL, + `map_varchar_int` MAP, + `map_int_varchar` MAP, + `map_varchar_varchar` MAP + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- Modify KEY + ALTER TABLE map_table MODIFY COLUMN map_varchar_int MAP; + + -- Modify VALUE + ALTER TABLE map_table MODIFY COLUMN map_int_varchar MAP; + + -- Modify KEY and VALUE + ALTER TABLE map_table MODIFY COLUMN map_varchar_varchar MAP; + + -- Check column types + DESC map_table; + +---------------------+------------------------------+------+-------+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +---------------------+------------------------------+------+-------+---------+-------+ + | k | int | No | true | NULL | | + | map_varchar_int | map | Yes | false | NULL | NONE | + | map_int_varchar | map | Yes | false | NULL | NONE | + | map_varchar_varchar | map | Yes | false | NULL | NONE | + +---------------------+------------------------------+------+-------+---------+-------+ + ``` - Map keys always be nullable. - - **Because nullable types are supported as map keys, key comparison in maps uses "null-safe equal" (null and null are considered equal), which differs from the standard SQL definition.** - -* `V` is the type of the value in the map. And it is always nullable. - -The Map type does not support duplicate keys; Doris will automatically remove duplicates (only one entry is retained for each identical key). - -### Description - -A Map of K, V items, it cannot be used as a key column. Now MAP can only used in Duplicate and Unique Model Tables. - -Need to manually enable the support, it is disabled by default. -``` -admin set frontend config("enable_map_type" = "true"); -``` -### Example - -Create table example: - -```sql -CREATE TABLE IF NOT EXISTS test.simple_map ( - `id` INT(11) NULL COMMENT "", - `m` Map NULL COMMENT "" -) ENGINE=OLAP -DUPLICATE KEY(`id`) -DISTRIBUTED BY HASH(`id`) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1", -"storage_format" = "V2" -); -``` - -Insert data example: - -```sql -mysql> INSERT INTO simple_map VALUES(1, {'a': 100, 'b': 200}); -``` - -stream_load examples: -See [STREAM LOAD](../../../../data-operate/import/import-way/stream-load-manual) for syntax details. - -```shell -# load the map data from json file -curl --location-trusted -uroot: -T events.json -H "format: json" -H "read_json_by_line: true" http://fe_host:8030/api/test/simple_map/_stream_load -# 返回结果 -{ - "TxnId": 106134, - "Label": "5666e573-9a97-4dfc-ae61-2d6b61fdffd2", - "Comment": "", - "TwoPhaseCommit": "false", - "Status": "Success", - "Message": "OK", - "NumberTotalRows": 10293125, - "NumberLoadedRows": 10293125, - "NumberFilteredRows": 0, - "NumberUnselectedRows": 0, - "LoadBytes": 2297411459, - "LoadTimeMs": 66870, - "BeginTxnTimeMs": 1, - "StreamLoadPutTimeMs": 80, - "ReadDataTimeMs": 6415, - "WriteDataTimeMs": 10550, - "CommitAndPublishTimeMs": 38 -} -``` - -Select all data example: - -```sql -mysql> SELECT * FROM simple_map; -+------+-----------------------------+ -| id | m | -+------+-----------------------------+ -| 1 | {'a':100, 'b':200} | -| 2 | {'b':100, 'c':200, 'd':300} | -| 3 | {'a':10, 'd':200} | -+------+-----------------------------+ -``` - -Select map column example: - -```sql -mysql> SELECT m FROM simple_map; -+-----------------------------+ -| m | -+-----------------------------+ -| {'a':100, 'b':200} | -| {'b':100, 'c':200, 'd':300} | -| {'a':10, 'd':200} | -+-----------------------------+ -``` - -Select map value according given key example: - -```sql -mysql> SELECT m['a'] FROM simple_map; -+-----------------------------+ -| %element_extract%(`m`, 'a') | -+-----------------------------+ -| 100 | -| NULL | -| 10 | -+-----------------------------+ -``` - -map functions examples: - -```sql -# map construct - -mysql> SELECT map('k11', 1000, 'k22', 2000)['k11']; -+---------------------------------------------------------+ -| %element_extract%(map('k11', 1000, 'k22', 2000), 'k11') | -+---------------------------------------------------------+ -| 1000 | -+---------------------------------------------------------+ - -mysql> SELECT map('k11', 1000, 'k22', 2000)['nokey']; -+-----------------------------------------------------------+ -| %element_extract%(map('k11', 1000, 'k22', 2000), 'nokey') | -+-----------------------------------------------------------+ -| NULL | -+-----------------------------------------------------------+ -1 row in set (0.06 sec) - -# map size - -mysql> SELECT map_size(map('k11', 1000, 'k22', 2000)); -+-----------------------------------------+ -| map_size(map('k11', 1000, 'k22', 2000)) | -+-----------------------------------------+ -| 2 | -+-----------------------------------------+ - -mysql> SELECT id, m, map_size(m) FROM simple_map ORDER BY id; -+------+-----------------------------+---------------+ -| id | m | map_size(`m`) | -+------+-----------------------------+---------------+ -| 1 | {"a":100, "b":200} | 2 | -| 2 | {"b":100, "c":200, "d":300} | 3 | -| 2 | {"a":10, "d":200} | 2 | -+------+-----------------------------+---------------+ -3 rows in set (0.04 sec) - -# map_contains_key - -mysql> SELECT map_contains_key(map('k11', 1000, 'k22', 2000), 'k11'); -+--------------------------------------------------------+ -| map_contains_key(map('k11', 1000, 'k22', 2000), 'k11') | -+--------------------------------------------------------+ -| 1 | -+--------------------------------------------------------+ -1 row in set (0.08 sec) - -mysql> SELECT id, m, map_contains_key(m, 'k1') FROM simple_map ORDER BY id; -+------+-----------------------------+-----------------------------+ -| id | m | map_contains_key(`m`, 'k1') | -+------+-----------------------------+-----------------------------+ -| 1 | {"a":100, "b":200} | 0 | -| 2 | {"b":100, "c":200, "d":300} | 0 | -| 2 | {"a":10, "d":200} | 0 | -+------+-----------------------------+-----------------------------+ -3 rows in set (0.10 sec) - -mysql> SELECT id, m, map_contains_key(m, 'a') FROM simple_map ORDER BY id; -+------+-----------------------------+----------------------------+ -| id | m | map_contains_key(`m`, 'a') | -+------+-----------------------------+----------------------------+ -| 1 | {"a":100, "b":200} | 1 | -| 2 | {"b":100, "c":200, "d":300} | 0 | -| 2 | {"a":10, "d":200} | 1 | -+------+-----------------------------+----------------------------+ -3 rows in set (0.17 sec) - -# map_contains_value - -mysql> SELECT map_contains_value(map('k11', 1000, 'k22', 2000), NULL); -+---------------------------------------------------------+ -| map_contains_value(map('k11', 1000, 'k22', 2000), NULL) | -+---------------------------------------------------------+ -| 0 | -+---------------------------------------------------------+ -1 row in set (0.04 sec) - -mysql> SELECT id, m, map_contains_value(m, '100') FROM simple_map ORDER BY id; -+------+-----------------------------+------------------------------+ -| id | m | map_contains_value(`m`, 100) | -+------+-----------------------------+------------------------------+ -| 1 | {"a":100, "b":200} | 1 | -| 2 | {"b":100, "c":200, "d":300} | 1 | -| 2 | {"a":10, "d":200} | 0 | -+------+-----------------------------+------------------------------+ -3 rows in set (0.11 sec) - -# map_keys - -mysql> SELECT map_keys(map('k11', 1000, 'k22', 2000)); -+-----------------------------------------+ -| map_keys(map('k11', 1000, 'k22', 2000)) | -+-----------------------------------------+ -| ["k11", "k22"] | -+-----------------------------------------+ -1 row in set (0.04 sec) - -mysql> SELECT id, map_keys(m) FROM simple_map ORDER BY id; -+------+-----------------+ -| id | map_keys(`m`) | -+------+-----------------+ -| 1 | ["a", "b"] | -| 2 | ["b", "c", "d"] | -| 2 | ["a", "d"] | -+------+-----------------+ -3 rows in set (0.19 sec) - -# map_values - -mysql> SELECT map_values(map('k11', 1000, 'k22', 2000)); -+-------------------------------------------+ -| map_values(map('k11', 1000, 'k22', 2000)) | -+-------------------------------------------+ -| [1000, 2000] | -+-------------------------------------------+ -1 row in set (0.03 sec) - -mysql> SELECT id, map_values(m) FROM simple_map ORDER BY id; -+------+-----------------+ -| id | map_values(`m`) | -+------+-----------------+ -| 1 | [100, 200] | -| 2 | [100, 200, 300] | -| 2 | [10, 200] | -+------+-----------------+ -3 rows in set (0.18 sec) - -``` - -### Keywords - - MAP diff --git a/docs/sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md b/docs/sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md index 8d2bae79d6fc1..82dd0e817cb38 100644 --- a/docs/sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md +++ b/docs/sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md @@ -5,101 +5,220 @@ } --- -## STRUCT +## Type Description -### name +The STRUCT type is used to combine multiple fields into a single structure, where each field can have its own name and type, suitable for representing nested or complex business data structures. -STRUCT +- `STRUCT` -### description + - `field_name` represents the name, **cannot be empty, cannot be duplicated, and is case-insensitive**. -`STRUCT` + - `field_type` represents the type, which is nullable and cannot be specified as NOT NULL. Supported types include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6, ARRAY, MAP, STRUCT`. -Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns. + - `[COMMENT 'comment-string']` represents an optional comment. -Need to manually enable the support, it is disabled by default. -``` -admin set frontend config("enable_struct_type" = "true"); -``` -It cannot be used as a Key column. Now STRUCT can only used in Duplicate Model Tables. +## Type Constraints -The names and number of Fields in a Struct is fixed and always Nullable, and a Field typically consists of the following parts. +- The maximum nesting depth supported by the `STRUCT` type is 9. -- field_name: Identifier naming the field, non repeatable. -- field_type: A data type. -- COMMENT: An optional string describing the field. (currently not supported) +- Conversion between `STRUCT` types depends on whether the internal types can be converted (names do not affect conversion). `STRUCT` types cannot be converted to other types. -The currently supported types are: + - String types can be converted to `STRUCT` types (through parsing, returning NULL if parsing fails). -``` -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, -DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +- In the `AGGREGATE` table model, `STRUCT` types only support `REPLACE` and `REPLACE_IF_NOT_NULL`. **In any table model, they cannot be used as KEY columns, nor as partition or bucket columns.** -We have a todo list for future version: +- Columns of `STRUCT` type do not support comparison or arithmetic operations, **do not support `ORDER BY` and `GROUP BY` operations, cannot be used as `JOIN KEY`, and cannot be used in `DELETE` statements.** -``` -TODO: Supports nested Struct or other complex types -``` +- Columns of `STRUCT` type do not support creating any indexes. -### example +## Type Construction -Create table example: +- Use `STRUCT()` to construct a value of type `STRUCT`, where the internal names start from col1. -``` -mysql> CREATE TABLE `struct_test` ( - `id` int(11) NULL, - `s_info` STRUCT NULL -) ENGINE=OLAP -DUPLICATE KEY(`id`) -COMMENT 'OLAP' -DISTRIBUTED BY HASH(`id`) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1", -"storage_format" = "V2", -"light_schema_change" = "true", -"disable_auto_compaction" = "false" -); -``` + ```SQL + SELECT STRUCT(1, 'a', "abc"); -Insert data example: + +--------------------------------------+ + | STRUCT(1, 'a', "abc") | + +--------------------------------------+ + | {"col1":1, "col2":"a", "col3":"abc"} | + +--------------------------------------+ + ``` -``` -INSERT INTO `struct_test` VALUES (1, {1, 'sn1', 'sa1'}); -INSERT INTO `struct_test` VALUES (2, struct(2, 'sn2', 'sa2')); -INSERT INTO `struct_test` VALUES (3, named_struct('s_id', 3, 's_name', 'sn3', 's_address', 'sa3')); -``` +- Use `NAMED_STRUCT()` to construct a specific `STRUCT` type value. -Stream load: + ```SQL + SELECT NAMED_STRUCT("name", "Jack", "id", 1728923); -test.csv: + +---------------------------------------------+ + | NAMED_STRUCT("name", "Jack", "id", 1728923) | + +---------------------------------------------+ + | {"name":"Jack", "id":1728923} | + +---------------------------------------------+ + ``` -``` -1|{"s_id":1, "s_name":"sn1", "s_address":"sa1"} -2|{s_id:2, s_name:sn2, s_address:sa2} -3|{"s_address":"sa3", "s_name":"sn3", "s_id":3} -``` +## Modifying Type -example: +- Modification is allowed only when the subcolumn type of `STRUCT` is `VARCHAR`. -``` -curl --location-trusted -u root -T test.csv -H "label:test_label" http://host:port/api/test/struct_test/_stream_load -``` + - Only allows changing the parameter of `VARCHAR` from smaller to larger, not the other way around. -Select data example: + ```SQL + CREATE TABLE struct_table ( + `k` INT NOT NULL, + `struct_varchar` STRUCT + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + ALTER TABLE struct_table MODIFY COLUMN struct_varchar STRUCT; + ``` + +- Subcolumns inside `STRUCT` type do not support deletion, but new subcolumns can be added at the end. + +```SQL + CREATE TABLE struct_table ( + `k` INT NOT NULL, + `struct_varchar` STRUCT + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- Add a subcolumn at the end + ALTER TABLE struct_table MODIFY COLUMN struct_varchar STRUCT; ``` -mysql> select * from struct_test; -+------+-------------------+ -| id | s_info | -+------+-------------------+ -| 1 | {1, 'sn1', 'sa1'} | -| 2 | {2, 'sn2', 'sa2'} | -| 3 | {3, 'sn3', 'sa3'} | -+------+-------------------+ -3 rows in set (0.02 sec) -``` -### keywords +## Element Access + +- Use `STRUCT_ELEMENT(struct, k/field_name)` to access a specific subcolumn inside the `STRUCT`. + + - k represents the position, starting from 1. + + - `field_name` is the name of the subcolumn in the `STRUCT`. + + ```SQL + SELECT STRUCT_ELEMENT(NAMED_STRUCT("name", "Jack", "id", 1728923), 1); - STRUCT + +----------------------------------------------------------------+ + | STRUCT_ELEMENT(NAMED_STRUCT("name", "Jack", "id", 1728923), 1) | + +----------------------------------------------------------------+ + | Jack | + +----------------------------------------------------------------+ + + SELECT STRUCT_ELEMENT(NAMED_STRUCT("name", "Jack", "id", 1728923), "id"); + + +-------------------------------------------------------------------+ + | STRUCT_ELEMENT(NAMED_STRUCT("name", "Jack", "id", 1728923), "id") | + +-------------------------------------------------------------------+ + | 1728923 | + +-------------------------------------------------------------------+ + ``` + +## Examples + +- Nested Complex Types + + ```SQL + -- Create table + CREATE TABLE IF NOT EXISTS struct_table ( + id INT, + struct_complex STRUCT< + basic_info: STRUCT, + contact: STRUCT, + preferences: STRUCT, settings: MAP>, + metadata: STRUCT< + created_at: DATETIME, + updated_at: DATETIME, + stats: STRUCT + > + > + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- Insert + INSERT INTO struct_table VALUES (1, STRUCT( + STRUCT('John', 25), + STRUCT('john@example.com', '1234567890'), + STRUCT(['tag1', 'tag2'], MAP('setting1', 1, 'setting2', 2)), + STRUCT('2021-01-01 00:00:00', '2021-01-02 00:00:00', STRUCT(100, 50)) + )); + + -- Query + SELECT STRUCT_ELEMENT(STRUCT_ELEMENT(struct_complex, 'basic_info'), 'name') FROM struct_table ORDER BY id; + + +----------------------------------------------------------------------+ + | STRUCT_ELEMENT(STRUCT_ELEMENT(struct_complex, 'basic_info'), 'name') | + +----------------------------------------------------------------------+ + | John | + +----------------------------------------------------------------------+ + + SELECT STRUCT_ELEMENT(STRUCT_ELEMENT(STRUCT_ELEMENT(struct_complex, 'metadata'), 'stats'), 'views') FROM struct_table ORDER BY id; + + +----------------------------------------------------------------------------------------------+ + | STRUCT_ELEMENT(STRUCT_ELEMENT(STRUCT_ELEMENT(struct_complex, 'metadata'), 'stats'), 'views') | + +----------------------------------------------------------------------------------------------+ + | 100 | + +----------------------------------------------------------------------------------------------+ + ``` + +- Modifying Type + +```SQL +-- Create table +CREATE TABLE struct_table ( + `k` INT NOT NULL, + `struct_varchar` STRUCT + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- Modify the type of the 'name' subcolumn + ALTER TABLE struct_table MODIFY COLUMN struct_varchar STRUCT; + + -- Check column types + DESC struct_table; + + +----------------+----------------------------------+------+-------+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +----------------+----------------------------------+------+-------+---------+-------+ + | k | int | No | true | NULL | | + | struct_varchar | struct | Yes | false | NULL | NONE | + +----------------+----------------------------------+------+-------+---------+-------+ + + -- Create table + CREATE TABLE struct_table ( + `k` INT NOT NULL, + `struct_varchar` STRUCT + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- Add a subcolumn at the end + ALTER TABLE struct_table MODIFY COLUMN struct_varchar STRUCT; + + -- Check column types + DESC struct_table; + + +----------------+-----------------------------------------+------+-------+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +----------------+-----------------------------------------+------+-------+---------+-------+ + | k | int | No | true | NULL | | + | struct_varchar | struct | Yes | false | NULL | NONE | + +----------------+-----------------------------------------+------+-------+---------+-------+ +``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/array.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/array.md index 3688e977fa2c0..c73f655740fd9 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/array.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/array.md @@ -5,17 +5,7 @@ } --- -`ARRAY` 表示由 T 类型元素组成的数组,不能作为 key 列使用。 - -- 2.0 之前仅支持在 Duplicate 模型的表中使用。 -- 从 2.0 版本开始支持在 Unique 模型的表中的非 key 列使用。 - -T 支持的类型有: - -```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +`ARRAY` 表示由 T 类型元素组成的数组。点击 [ARRAY 数据类型](../../../sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md) 了解具体信息。 ## CSV 格式导入 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/map.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/map.md index a057b27e1788c..3c7d156b0d2ae 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/map.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/map.md @@ -5,16 +5,7 @@ } --- -`MAP` 表示由 K, V 类型元素组成的 map,不能作为 key 列使用。 - -- 目前支持在 Duplicate,Unique 模型的表中使用。 - -K, V 支持的类型有: - -```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +`MAP` 表示由`K,V`类型元素组成的`MAP`。 点击[MAP 数据类型](../../../sql-manual/basic-element/sql-data-types/semi-structured/MAP.md) 了解具体信息。 ## CSV 格式导入 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/struct.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/struct.md index be42d820312d8..acc510e094018 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/struct.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/struct.md @@ -5,20 +5,7 @@ } --- -`STRUCT` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。 - -- 不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 -- 一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 - - field_name: Field 的标识符,不可重复 - - field_type: Field 的类型 - - COMMENT: Field 的注释,可选 (暂不支持) - -当前可支持的类型有: - -```sql -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, -DATE, DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +`STRUCT` 表示由多个 Field 组成的结构体,也可被理解为多个列的集合。点击[STRUCT 数据类型](../../../sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md) 了解具体信息。 ## CSV 格式导入 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md index 2dd45c876f2f5..0f6a5ddfb702c 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md @@ -5,58 +5,266 @@ } --- -## 描述 +## 类型描述 -`ARRAY` +`ARRAY` 类型用于表示有序元素集合,集合中的每个元素具有相同的数据类型。例如,一个整数数组可表示为`[1, 2, 3]`,一个字符串数组可表示为`["a", "b", "c"]`。 -由 T 类型元素组成的数组,不能作为 key 列使用。目前支持在 Duplicate 和 Unique 模型的表中使用。 +- `ARRAY` 表示由T类型组成的数组,T类型是Nullable的,T支持的类型有:`BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,LARGEINT,FLOAT,DOUBLE,DECIMAL,DATE,DATETIME,CHAR,VARCHAR,STRING,IPTV4,IPV6,STRUCT,MAP,VARIANT,JSONB,ARRAY`。 + - 注意:上述T类型中的`JSONB`和`VARIANT`只是在Doris层中的计算层支持,**不支持Doris建表中使用`ARRAY`和`ARRAY`**。 -2.0 版本之后支持在 Unique 模型的表中非 key 列使用。 +## 类型约束 -T 支持的类型有: +- `ARRAY`类型支持的最大嵌套深度为9。 +- `ARRAY`类型之间的转换取决于T之间是否能转换,`Array`类型不能转成其他类型。 + - 例如:`ARRAY`可以转换为`ARRAY`,因为`INT`和`BIGINT`之间可以转换。 + - `Variant`类型可以转换成`Array`类型。 + - 字符串类型可以转换成`ARRAY`类型(通过解析的形式,解析失败返回 NULL)。 +- `ARRAY`类型在`AGGREGATE`表模型中只支持`REPLACE`和`REPLACE_IF_NOT_NULL`,**在任何表模型中都无法作为KEY列,无法作为分区分桶列**。 +- `ARRAY`类型的列**支持`ORDER BY`和`GROUP BY`操作**。 + - 支持`ORDER BY`和`GROUP BY`的T类型包括:`BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,LARGEINT,FLOAT,DOUBLE,DECIMAL,DATE,DATETIME,CHAR,VARCHAR,STRING,IPTV4,IPV6`。 +- `ARRAY`类型的列不支持作为 `JOIN KEY`,不支持在`DELETE`语句中使用。 + +## 常量构造 -``` -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, -DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` +- 使用`ARRAY()`函数可以构造一个`ARRAY`类型的值,T类型为参数的公共类型。 + + ```SQL + -- [1, 2, 3] T 是 INT + SELECT ARRAY(1, 2, 3); -## 举例 + -- ["1", "2", "abc"] , T 是 STRING + SELECT ARRAY(1, 2, 'abc'); + ``` +- 使用`[]`可以构造一个`ARRAY`类型的值,T类型为参数的公共类型。 + + ```SQL + -- ["abc", "def", "efg"] T 是 STRING + SELECT ["abc", "def", "efg"]; -建表示例如下: + -- ["1", "2", "abc"] , T 是 STRING + SELECT [1, 2, 'abc']; + ``` + +## 修改类型 -``` -mysql> CREATE TABLE `array_test` ( - `id` int(11) NULL COMMENT "", - `c_array` ARRAY NULL COMMENT "" -) ENGINE=OLAP -DUPLICATE KEY(`id`) -COMMENT "OLAP" -DISTRIBUTED BY HASH(`id`) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1", -"in_memory" = "false", -"storage_format" = "V2" -); -``` +- 当`ARRAY`内部的元素类型为`VARCHAR`时,才允许进行修改。 + - 只允许将`VARCHAR`的参数从小改到大。反之不行。 -插入数据示例: + ```SQL + CREATE TABLE `array_table` ( + `k` INT NOT NULL, + `array_column` ARRAY + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); -``` -mysql> INSERT INTO `array_test` VALUES (1, [1,2,3,4,5]); -mysql> INSERT INTO `array_test` VALUES (2, [6,7,8]), (3, []), (4, null); -``` + ALTER TABLE array_table MODIFY COLUMN array_column ARRAY; + ``` +- `ARRAY`类型的列默认值只能指定为NULL,如果指定后不能修改。 + +## 元素访问 -查询数据示例: +- 使用`[k]`的方式访问`ARRAY`的第k个元素,k从1开始,越界之后返回NULL。 -``` -mysql> SELECT * FROM `array_test`; -+------+-----------------+ -| id | c_array | -+------+-----------------+ -| 1 | [1, 2, 3, 4, 5] | -| 2 | [6, 7, 8] | -| 3 | [] | -| 4 | NULL | -+------+-----------------+ -``` + ```SQL + SELECT [1, 2, 3][1]; + +--------------+ + | [1, 2, 3][1] | + +--------------+ + | 1 | + +--------------+ + SELECT ARRAY(1, 2, 3)[2]; + +-------------------+ + | ARRAY(1, 2, 3)[2] | + +-------------------+ + | 2 | + +-------------------+ + + SELECT [[1,2,3],[2,3,4]][1][3]; + +-------------------------+ + | [[1,2,3],[2,3,4]][1][3] | + +-------------------------+ + | 3 | + +-------------------------+ + ``` + +- 使用`ELEMENT_AT(ARRAY, k)`的方式访问`ARRAY`的第k个元素,k从1开始,越界之后返回NULL。 + + ```SQL + SELECT ELEMENT_AT(ARRAY(1, 2, 3) , 2); + +--------------------------------+ + | ELEMENT_AT(ARRAY(1, 2, 3) , 3) | + +--------------------------------+ + | 2 | + +--------------------------------+ + + SELECT ELEMENT_AT([1, 2, 3] , 3); + +---------------------------+ + | ELEMENT_AT([1, 2, 3] , 3) | + +---------------------------+ + | 3 | + +---------------------------+ + + SELECT ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3); + +-------------------------------------------------------+ + | ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3) | + +-------------------------------------------------------+ + | ["3"] | + +-------------------------------------------------------+ + ``` + +## 查询加速 + +- Doris表中`ARRAY`类型的列支持添加倒排索引,用来加速这一列执行`ARRAY`函数的计算。 + - T类型为倒排索引支持的类型:`BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPTV4, IPV6`。 + - 支持加速的ARRAY函数为:`ARRAY_CONTAINS, ARRAYS_OVERLAP`,但是当函数中的参数包含NULL时,会退化为普通的向量化计算。 + +## 示例 + +- 多维数组 + + ```SQL + -- 创建表 + CREATE TABLE IF NOT EXISTS array_table ( + id INT, + two_dim_array ARRAY>, + three_dim_array ARRAY>> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- 插入 + INSERT INTO array_table VALUES (1, [[1, 2, 3], [4, 5, 6]], [[['ab', 'cd', 'ef'], ['gh', 'ij', 'kl']], [['mn', 'op', 'qr'], ['st', 'uv', 'wx']]]); + + INSERT INTO array_table VALUES (2, ARRAY(ARRAY(1, 2, 3), ARRAY(4, 5, 6)), ARRAY(ARRAY(ARRAY('ab', 'cd', 'ef'), ARRAY('gh', 'ij', 'kl')), ARRAY(ARRAY('mn', 'op', 'qr'), ARRAY('st', 'uv', 'wx')))); + + -- 查询 + SELECT two_dim_array[1][2], three_dim_array[1][1][2] FROM ${tableName} ORDER BY id; + +---------------------+--------------------------+ + | two_dim_array[1][2] | three_dim_array[1][1][2] | + +---------------------+--------------------------+ + | 2 | cd | + | 2 | cd | + +---------------------+--------------------------+ + ``` + +- 复杂类型嵌套 + + ```SQL + -- 创建表 + CREATE TABLE IF NOT EXISTS array_map_table ( + id INT, + array_map ARRAY> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- 插入 + INSERT INTO array_map_table VALUES (1, ARRAY(MAP('key1', 1), MAP('key2', 2))); + INSERT INTO array_map_table VALUES (2, ARRAY(MAP('key1', 1), MAP('key2', 2))) + + -- 查询 + SELECT array_map[1], array_map[2] FROM array_map_table ORDER BY id; + +--------------+--------------+ + | array_map[1] | array_map[2] | + +--------------+--------------+ + | {"key1":1} | {"key2":2} | + | {"key1":1} | {"key2":2} | + +--------------+--------------+ + + -- 创建表 + CREATE TABLE IF NOT EXISTS array_table ( + id INT, + array_struct ARRAY>, + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + INSERT INTO array_table VALUES (1, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane'))); + INSERT INTO array_table VALUES (2, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane'))); + + SELECT array_struct[1], array_struct[2] FROM array_table ORDER BY id; + +-------------------------+-------------------------+ + | array_struct[1] | array_struct[2] | + +-------------------------+-------------------------+ + | {"id":1, "name":"John"} | {"id":2, "name":"Jane"} | + | {"id":1, "name":"John"} | {"id":2, "name":"Jane"} | + +-------------------------+-------------------------+ + ``` + + +- 修改类型 + + ```SQL + -- 创建表 + CREATE TABLE array_table ( + id INT, + array_varchar ARRAY + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- 修改 ARRAY 类型 + ALTER TABLE array_table MODIFY COLUMN array_varchar ARRAY; + + -- 查看列类型 + DESC array_table; + +---------------+--------------------+------+-------+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +---------------+--------------------+------+-------+---------+-------+ + | id | int | Yes | true | NULL | | + | array_varchar | array | Yes | false | NULL | NONE | + +---------------+--------------------+------+-------+---------+-------+ + ``` + +- 倒排索引 + + ```SQL + -- 建表语句 + CREATE TABLE `array_table` ( + `k` int NOT NULL, + `array_column` ARRAY, + INDEX idx_array_column (array_column) USING INVERTED + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- 插入 + INSERT INTO array_table VALUES (1, [1, 2, 3]), (2, [4, 5, 6]), (3, [7, 8, 9]); + + -- 倒排索引会加速 ARRAY_CONTAINS 函数的执行 + SELECT * FROM array_table WHERE ARRAY_CONTAINS(array_column, 5); + +------+--------------+ + | k | array_column | + +------+--------------+ + | 2 | [4, 5, 6] | + +------+--------------+ + + -- 倒排索引会加速 ARRAYS_OVERLAP 函数的执行 + SELECT * FROM array_table WHERE ARRAYS_OVERLAP(array_column, [6, 9]); + +------+--------------+ + | k | array_column | + +------+--------------+ + | 2 | [4, 5, 6] | + | 3 | [7, 8, 9] | + +------+--------------+ + ``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/MAP.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/MAP.md index ed110949abd30..b2aaea826f859 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/MAP.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/MAP.md @@ -5,262 +5,206 @@ } --- -## MAP +## 类型描述 -### Name - -MAP - -### 语法 -`MAP` - -其中: - -* K 是 Map 的键类型。键必须使用以下类型之一: - * 字符串类型(Char/Varchar/String) - * 数值类型(不包括浮点数类型:double 和 float)1 - * 日期类型 - * IP 地址类型(IPV4/IPV6) - - Map 的 Key 类型 始终为 nullable。 +- `MAP`类型用于表示键值对集合的复合类型,每个键(key)唯一地对应一个值(value)。 + - `key_type` 表征键的类型,支持的类型为`BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING,IPTV4, IPV6`,key值是Nullable的,不支持指定NOT NULL。 +- `value_type` 表征值的类型,支持 `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING,IPV4, IPV6, ARRAY, MAP, STRUCT`,值是 Nullable 的,不支持指定 NOT NULL。 - **因为支持 nullable 类型作为 map 的 key,map 中的 key 比较使用的是 “null-safe equal”(即 null 和 null 被认为是相等的),这与标准 SQL 的定义不同。** - -* V 是 Map 中值的类型,始终为 nullable。 - -Map 类型不支持重复的键;Doris 会自动删除重复的项。 - -### 描述 - -由 K, V 类型元素组成的 map,不能作为 key 列使用。目前支持在 Duplicate,Unique 模型的表中使用。 - - -Map 类型支持默认情况下是没有开启的,需要设置参数: -``` -admin set frontend config("enable_map_type" = "true"); -``` - -## 举例 - -建表示例如下: - -```sql -CREATE TABLE IF NOT EXISTS test.simple_map ( - `id` INT(11) NULL COMMENT "", - `m` Map NULL COMMENT "" -) ENGINE=OLAP -DUPLICATE KEY(`id`) -DISTRIBUTED BY HASH(`id`) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1", -"storage_format" = "V2" -); -``` - -插入数据示例: - -```sql -mysql> INSERT INTO simple_map VALUES(1, {'a': 100, 'b': 200}); -``` - - - -```shell -# load the map data from json file -curl --location-trusted -uroot: -T events.json -H "format: json" -H "read_json_by_line: true" http://fe_host:8030/api/test/simple_map/_stream_load -# 返回结果 -{ - "TxnId": 106134, - "Label": "5666e573-9a97-4dfc-ae61-2d6b61fdffd2", - "Comment": "", - "TwoPhaseCommit": "false", - "Status": "Success", - "Message": "OK", - "NumberTotalRows": 10293125, - "NumberLoadedRows": 10293125, - "NumberFilteredRows": 0, - "NumberUnselectedRows": 0, - "LoadBytes": 2297411459, - "LoadTimeMs": 66870, - "BeginTxnTimeMs": 1, - "StreamLoadPutTimeMs": 80, - "ReadDataTimeMs": 6415, - "WriteDataTimeMs": 10550, - "CommitAndPublishTimeMs": 38 -} -``` - - -查询数据示例: - -```sql -mysql> SELECT * FROM simple_map; -+------+-----------------------------+ -| id | m | -+------+-----------------------------+ -| 1 | {'a':100, 'b':200} | -| 2 | {'b':100, 'c':200, 'd':300} | -| 3 | {'a':10, 'd':200} | -+------+-----------------------------+ -``` - -查询 map 列示例: - -```sql -mysql> SELECT m FROM simple_map; -+-----------------------------+ -| m | -+-----------------------------+ -| {'a':100, 'b':200} | -| {'b':100, 'c':200, 'd':300} | -| {'a':10, 'd':200} | -+-----------------------------+ -``` - -map 取值示例: - -```sql -mysql> SELECT m['a'] FROM simple_map; -+-----------------------------+ -| %element_extract%(`m`, 'a') | -+-----------------------------+ -| 100 | -| NULL | -| 10 | -+-----------------------------+ -``` - -map 支持的 functions 示例: - -```sql -# map construct - -mysql> SELECT map('k11', 1000, 'k22', 2000)['k11']; -+---------------------------------------------------------+ -| %element_extract%(map('k11', 1000, 'k22', 2000), 'k11') | -+---------------------------------------------------------+ -| 1000 | -+---------------------------------------------------------+ - -mysql> SELECT map('k11', 1000, 'k22', 2000)['nokey']; -+-----------------------------------------------------------+ -| %element_extract%(map('k11', 1000, 'k22', 2000), 'nokey') | -+-----------------------------------------------------------+ -| NULL | -+-----------------------------------------------------------+ -1 row in set (0.06 sec) - -# map size - -mysql> SELECT map_size(map('k11', 1000, 'k22', 2000)); -+-----------------------------------------+ -| map_size(map('k11', 1000, 'k22', 2000)) | -+-----------------------------------------+ -| 2 | -+-----------------------------------------+ - -mysql> SELECT id, m, map_size(m) FROM simple_map ORDER BY id; -+------+-----------------------------+---------------+ -| id | m | map_size(`m`) | -+------+-----------------------------+---------------+ -| 1 | {"a":100, "b":200} | 2 | -| 2 | {"b":100, "c":200, "d":300} | 3 | -| 2 | {"a":10, "d":200} | 2 | -+------+-----------------------------+---------------+ -3 rows in set (0.04 sec) - -# map_contains_key - -mysql> SELECT map_contains_key(map('k11', 1000, 'k22', 2000), 'k11'); -+--------------------------------------------------------+ -| map_contains_key(map('k11', 1000, 'k22', 2000), 'k11') | -+--------------------------------------------------------+ -| 1 | -+--------------------------------------------------------+ -1 row in set (0.08 sec) - -mysql> SELECT id, m, map_contains_key(m, 'k1') FROM simple_map ORDER BY id; -+------+-----------------------------+-----------------------------+ -| id | m | map_contains_key(`m`, 'k1') | -+------+-----------------------------+-----------------------------+ -| 1 | {"a":100, "b":200} | 0 | -| 2 | {"b":100, "c":200, "d":300} | 0 | -| 2 | {"a":10, "d":200} | 0 | -+------+-----------------------------+-----------------------------+ -3 rows in set (0.10 sec) - -mysql> SELECT id, m, map_contains_key(m, 'a') FROM simple_map ORDER BY id; -+------+-----------------------------+----------------------------+ -| id | m | map_contains_key(`m`, 'a') | -+------+-----------------------------+----------------------------+ -| 1 | {"a":100, "b":200} | 1 | -| 2 | {"b":100, "c":200, "d":300} | 0 | -| 2 | {"a":10, "d":200} | 1 | -+------+-----------------------------+----------------------------+ -3 rows in set (0.17 sec) - -# map_contains_value - -mysql> SELECT map_contains_value(map('k11', 1000, 'k22', 2000), NULL); -+---------------------------------------------------------+ -| map_contains_value(map('k11', 1000, 'k22', 2000), NULL) | -+---------------------------------------------------------+ -| 0 | -+---------------------------------------------------------+ -1 row in set (0.04 sec) - -mysql> SELECT id, m, map_contains_value(m, '100') FROM simple_map ORDER BY id; -+------+-----------------------------+------------------------------+ -| id | m | map_contains_value(`m`, 100) | -+------+-----------------------------+------------------------------+ -| 1 | {"a":100, "b":200} | 1 | -| 2 | {"b":100, "c":200, "d":300} | 1 | -| 2 | {"a":10, "d":200} | 0 | -+------+-----------------------------+------------------------------+ -3 rows in set (0.11 sec) - -# map_keys - -mysql> SELECT map_keys(map('k11', 1000, 'k22', 2000)); -+-----------------------------------------+ -| map_keys(map('k11', 1000, 'k22', 2000)) | -+-----------------------------------------+ -| ["k11", "k22"] | -+-----------------------------------------+ -1 row in set (0.04 sec) - -mysql> SELECT id, map_keys(m) FROM simple_map ORDER BY id; -+------+-----------------+ -| id | map_keys(`m`) | -+------+-----------------+ -| 1 | ["a", "b"] | -| 2 | ["b", "c", "d"] | -| 2 | ["a", "d"] | -+------+-----------------+ -3 rows in set (0.19 sec) - -# map_values - -mysql> SELECT map_values(map('k11', 1000, 'k22', 2000)); -+-------------------------------------------+ -| map_values(map('k11', 1000, 'k22', 2000)) | -+-------------------------------------------+ -| [1000, 2000] | -+-------------------------------------------+ -1 row in set (0.03 sec) - -mysql> SELECT id, map_values(m) FROM simple_map ORDER BY id; -+------+-----------------+ -| id | map_values(`m`) | -+------+-----------------+ -| 1 | [100, 200] | -| 2 | [100, 200, 300] | -| 2 | [10, 200] | -+------+-----------------+ -3 rows in set (0.18 sec) - -``` - -### keywords - - MAP +## 类型约束 +- `MAP`类型允许的最大嵌套深度是9。 +- `MAP` 的**Key可以是 NULL,并且允许相同的Key(NULL和NULL也被认为是相同的Key)**。 +- `MAP` 类型之间的转换取决于`key_type`之间以及`value_type`之间是否能转换,`MAP`类型不能转成其他类型。 + - 例如: `MAP`可以转换为`MAP`,因为`INT`和`BIGINT`可以转换。 + - 字符串类型可以转换成`MAP`类型(通过解析的形式,解析失败返回 NULL)。 +- `MAP` 类型在`AGGREGATE`表模型中只支持`REPLACE`和`REPLACE_IF_NOT_NULL`,**在任何表模型中都无法作为`Key`列,无法作为分区分桶列**。 +- `MAP`类型的列不支持比较或者算数运算,**不支持`ORDER BY`和`GROUP BY`操作,不支持作为`JOIN KEY`,不支持在`DELETE`语句中使用**。 +- `MAP`类型的列不支持建立任何索引。 + +## 类型构造 +- `MAP()` 函数可以返回一个`MAP`类型的值。 + + ````SQL + SELECT MAP('Alice', 21, 'Bob', 23); + + +-----------------------------+ + | map('Alice', 21, 'Bob', 23) | + +-----------------------------+ + | {"Alice":21, "Bob":23} | + +-----------------------------+ + ```` +- `{}`可以构造一个`MAP`类型的值。 + ```SQL + SELECT {'Alice': 20}; + + +---------------+ + | {'Alice': 20} | + +---------------+ + | {"Alice":20} | + +---------------+ + ``` + +## 修改类型 + +- 当`MAP`的`key_type`或`value_type`为`VARCHAR`时,才允许进行修改。 + - 只允许将`VARCHAR`的参数从小改到大。反之不行。 + + ```SQL + CREATE TABLE `map_table` ( + `k` INT NOT NULL, + `map_varchar_int` MAP, + `map_int_varchar` MAP, + `map_varchar_varchar` MAP + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + ALTER TABLE map_table MODIFY COLUMN map_varchar_int MAP; + + ALTER TABLE map_table MODIFY COLUMN map_int_varchar MAP; + + ALTER TABLE map_table MODIFY COLUMN map_varchar_varchar MAP; + ``` +- `MAP`类型的列默认值只能指定为NULL,如果指定后不能修改。 + +## 元素访问 +- 使用`[key]`的方式访问`MAP`的`Key`对应的`Value`。 + ```SQL + SELECT {'Alice': 20}['Alice']; + + +------------------------+ + | {'Alice': 20}['Alice'] | + +------------------------+ + | 20 | + +------------------------+` + ``` + +- 使用 `ELEMENT_AT(MAP, Key) `的方式访问 `MAP`的`Key`对应的`Value`。 + ```SQL + SELECT ELEMENT_AT({'Alice': 20}, 'Alice'); + + +------------------------------------+ + | ELEMENT_AT({'Alice': 20}, 'Alice') | + +------------------------------------+ + | 20 | + +------------------------------------+ + ``` + +## 示例 + +- 多层`MAP`嵌套 + + ```SQL + -- 建表 + CREATE TABLE IF NOT EXISTS map_table ( + id INT, + map_nested MAP> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + --插入 + INSERT INTO map_table VALUES (1, MAP('key1', MAP('key2', 1, 'key3', 2))); + INSERT INTO map_table VALUES (2, MAP('key1', MAP('key2', 3, 'key3', 4))); + + -- 查询 + SELECT map_nested['key1']['key2'] FROM map_table order by id; + +----------------------------+ + | map_nested['key1']['key2'] | + +----------------------------+ + | 1 | + | 3 | + +----------------------------+ + + ``` +- 复杂类型嵌套 + + ```SQL + -- 建表 + CREATE TABLE IF NOT EXISTS map_table ( + id INT, + map_array MAP> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- 插入 + INSERT INTO map_table VALUES (1, MAP('key1', [1, 2, 3])), (2, MAP('key1', [4, 5, 6])); + + -- 查询 + SELECT map_array['key1'][1] FROM map_table order by id; + +----------------------+ + | map_array['key1'][1] | + +----------------------+ + | 1 | + | 4 | + +----------------------+ + + -- 建表 + CREATE TABLE IF NOT EXISTS map_table ( + id INT, + map_struct MAP> + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- 插入 + INSERT INTO map_table VALUES (1, MAP('key1', STRUCT(1, 'John'), 'key2', STRUCT(3, 'Jane'))); + + -- 查询 + SELECT STRUCT_ELEMENT(map_struct['key1'], 1), STRUCT_ELEMENT(map_struct['key1'], 'name') FROM map_table order by id; + +---------------------------------------+--------------------------------------------+ + | STRUCT_ELEMENT(map_struct['key1'], 1) | STRUCT_ELEMENT(map_struct['key1'], 'name') | + +---------------------------------------+--------------------------------------------+ + | 1 | John | + +---------------------------------------+--------------------------------------------+ + ``` + +- 修改类型 + + ```SQL + -- 建表 + CREATE TABLE `map_table` ( + `k` INT NOT NULL, + `map_varchar_int` MAP, + `map_int_varchar` MAP, + `map_varchar_varchar` MAP + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- 修改 KEY + ALTER TABLE map_table MODIFY COLUMN map_varchar_int MAP; + + -- 修改 VALUE + ALTER TABLE map_table MODIFY COLUMN map_int_varchar MAP; + + -- 修改 KEY VALUE + ALTER TABLE map_table MODIFY COLUMN map_varchar_varchar MAP; + + -- 查看列类型 + DESC map_table; + +---------------------+------------------------------+------+-------+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +---------------------+------------------------------+------+-------+---------+-------+ + | k | int | No | true | NULL | | + | map_varchar_int | map | Yes | false | NULL | NONE | + | map_int_varchar | map | Yes | false | NULL | NONE | + | map_varchar_varchar | map | Yes | false | NULL | NONE | + +---------------------+------------------------------+------+-------+---------+-------+ + ``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md index 8c8ac7ef6f14c..635cbe9be7c89 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md @@ -5,98 +5,199 @@ } --- -## STRUCT - -### name - -STRUCT - -## 描述 - -`STRUCT` - -由多个 Field 组成的结构体,也可被理解为多个列的集合。不能作为 Key 使用,目前 STRUCT 仅支持在 Duplicate 模型的表中使用。 - - -一个 Struct 中的 Field 的名字和数量固定,总是为 Nullable,一个 Field 通常由下面部分组成。 - -- field_name: Field 的标识符,不可重复 -- field_type: Field 的类型 -- COMMENT: Field 的注释,可选 (暂不支持) - -当前可支持的类型有: - -``` -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE, -DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING -``` - -在将来的版本我们还将完善: - -``` -TODO:支持嵌套 STRUCT 或其他的复杂类型 -``` - -## 举例 - -建表示例如下: - -``` -mysql> CREATE TABLE `struct_test` ( - `id` int(11) NULL, - `s_info` STRUCT NULL -) ENGINE=OLAP -DUPLICATE KEY(`id`) -COMMENT 'OLAP' -DISTRIBUTED BY HASH(`id`) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1", -"storage_format" = "V2", -"light_schema_change" = "true", -"disable_auto_compaction" = "false" -); +## 类型描述 + +STRUCT 类型用于将多个字段组合成一个结构体,每个字段可以有自己的名字和类型,适合表示嵌套或复杂的业务数据结构。 +- `STRUCT` + - `field_name` 表征名字,**不可为空,不可重复,名字不区分大小写**。 + - `field_type` 表征类型,类型是Nullable的,不可指定NOT NULL,支持的类型有:`BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPTV4, IPV6, ARRAY, MAP, STRUCT`。 + - `[COMMENT 'comment-string']` 表征注释,可选的。 + +## 类型约束 +- `STRUCT`类型支持的最大嵌套深度为 9。 +- `STRUCT` 类型之间的转换取决于内部的类型之间是否能转换(名字不影响转换),`STRUCT`类型不能转成其他类型。 + - 字符串类型可以转换成`STRUCT`类型(通过解析的形式,解析失败返回 NULL)。 +- `STRUCT` 类型在`AGGREGATE`表模型中只支持`REPLACE`和`REPLACE_IF_NOT_NULL`,**在任何表模型中都无法作为KEY列,无法作为分区分桶列。** +- `STRUCT`类型的列不支持比较或者算数运算,**不支持`ORDER BY`和`GROUP BY`操作,不支持作为`JOIN KEY`,不支持在`DELETE`语句中使用**。 +- `STRUCT`类型的列不支持建立任何索引。 + +## 类型构造 + +- 使用`STRUCT()`可以构造一个的`STRUCT`类型的值,`STRUCT`内部的名字从col1开始。 + ```SQL + SELECT STRUCT(1, 'a', "abc"); + + +--------------------------------------+ + | STRUCT(1, 'a', "abc") | + +--------------------------------------+ + | {"col1":1, "col2":"a", "col3":"abc"} | + +--------------------------------------+ + ``` +- 使用`NAMED_STRUCT()` 构造一个既定的`STRUCT`类型的值。 + ```SQL + SELECT NAMED_STRUCT("name", "Jack", "id", 1728923); + + +---------------------------------------------+ + | NAMED_STRUCT("name", "Jack", "id", 1728923) | + +---------------------------------------------+ + | {"name":"Jack", "id":1728923} | + +---------------------------------------------+ + ``` + +## 修改类型 + +- `STRUCT`的子列类型为`VARCHAR`时,才允许进行修改。 + - 只允许将`VARCHAR`的参数从小改到大。反之不行。 + + ```SQL + CREATE TABLE struct_table ( + `k` INT NOT NULL, + `struct_varchar` STRUCT + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + ALTER TABLE struct_table MODIFY COLUMN struct_varchar STRUCT; + ``` + +- `STRUCT`类型内部的子列不支持删除,可以在末尾增加子列。 + +```SQL + CREATE TABLE struct_table ( + `k` INT NOT NULL, + `struct_varchar` STRUCT + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- 在末尾增加一个子列 + ALTER TABLE struct_table MODIFY COLUMN struct_varchar STRUCT; ``` -插入数据示例: - -Insert: - -``` -INSERT INTO `struct_test` VALUES (1, {1, 'sn1', 'sa1'}); -INSERT INTO `struct_test` VALUES (2, struct(2, 'sn2', 'sa2')); -INSERT INTO `struct_test` VALUES (3, named_struct('s_id', 3, 's_name', 'sn3', 's_address', 'sa3')); -``` - -Stream load: - -test.csv: - -``` -1|{"s_id":1, "s_name":"sn1", "s_address":"sa1"} -2|{s_id:2, s_name:sn2, s_address:sa2} -3|{"s_address":"sa3", "s_name":"sn3", "s_id":3} -``` - -示例: - -``` -curl --location-trusted -u root -T test.csv -H "label:test_label" http://host:port/api/test/struct_test/_stream_load -``` - -查询数据示例: - -``` -mysql> select * from struct_test; -+------+-------------------+ -| id | s_info | -+------+-------------------+ -| 1 | {1, 'sn1', 'sa1'} | -| 2 | {2, 'sn2', 'sa2'} | -| 3 | {3, 'sn3', 'sa3'} | -+------+-------------------+ -3 rows in set (0.02 sec) -``` - -### keywords - - STRUCT +## 元素访问 + +- 使用`STRUCT_ELEMENT(struct, k/field_name)`访问`STRUCT`内部的某一个子列。 + - k表征位置,从1开始。 + - `filed_name` 是`STRUCT`的子列的名字。 + ```SQL + SELECT STRUCT_ELEMENT(NAMED_STRUCT("name", "Jack", "id", 1728923), 1); + + +----------------------------------------------------------------+ + | STRUCT_ELEMENT(NAMED_STRUCT("name", "Jack", "id", 1728923), 1) | + +----------------------------------------------------------------+ + | Jack | + +----------------------------------------------------------------+ + + + SELECT STRUCT_ELEMENT(NAMED_STRUCT("name", "Jack", "id", 1728923), "id"); + + +-------------------------------------------------------------------+ + | STRUCT_ELEMENT(NAMED_STRUCT("name", "Jack", "id", 1728923), "id") | + +-------------------------------------------------------------------+ + | 1728923 | + +-------------------------------------------------------------------+ + ``` +## 示例 + +- 嵌套复杂类型 + + ```SQL + -- 建表 + CREATE TABLE IF NOT EXISTS struct_table ( + id INT, + struct_complex STRUCT< + basic_info: STRUCT, + contact: STRUCT, + preferences: STRUCT, settings: MAP>, + metadata: STRUCT< + created_at: DATETIME, + updated_at: DATETIME, + stats: STRUCT + > + > + ) ENGINE=OLAP + DUPLICATE KEY(id) + DISTRIBUTED BY HASH(id) BUCKETS 1 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); + + -- 插入 + INSERT INTO struct_table VALUES (1, STRUCT( + STRUCT('John', 25), + STRUCT('john@example.com', '1234567890'), + STRUCT(['tag1', 'tag2'], MAP('setting1', 1, 'setting2', 2)), + STRUCT('2021-01-01 00:00:00', '2021-01-02 00:00:00', STRUCT(100, 50)) + )); + + -- 查询 + SELECT STRUCT_ELEMENT(STRUCT_ELEMENT(struct_complex, 'basic_info'), 'name') FROM struct_table order by id; + +----------------------------------------------------------------------+ + | STRUCT_ELEMENT(STRUCT_ELEMENT(struct_complex, 'basic_info'), 'name') | + +----------------------------------------------------------------------+ + | John | + +----------------------------------------------------------------------+ + + SELECT STRUCT_ELEMENT(STRUCT_ELEMENT(STRUCT_ELEMENT(struct_complex, 'metadata'), 'stats'), 'views') FROM struct_table order by id; + +----------------------------------------------------------------------------------------------+ + | STRUCT_ELEMENT(STRUCT_ELEMENT(STRUCT_ELEMENT(struct_complex, 'metadata'), 'stats'), 'views') | + +----------------------------------------------------------------------------------------------+ + | 100 | + +----------------------------------------------------------------------------------------------+ + ``` + +- 修改类型 + +```SQL +-- 建表 +CREATE TABLE struct_table ( + `k` INT NOT NULL, + `struct_varchar` STRUCT + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- 修改 name 这一列的类型 + ALTER TABLE struct_table MODIFY COLUMN struct_varchar STRUCT; + + -- 查看列类型 + DESC struct_table; + +----------------+----------------------------------+------+-------+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +----------------+----------------------------------+------+-------+---------+-------+ + | k | int | No | true | NULL | | + | struct_varchar | struct | Yes | false | NULL | NONE | + +----------------+----------------------------------+------+-------+---------+-------+ + + -- 建表 + CREATE TABLE struct_table ( + `k` INT NOT NULL, + `struct_varchar` STRUCT + ) ENGINE=OLAP + DUPLICATE KEY(`k`) + DISTRIBUTED BY HASH(`k`) BUCKETS 1 + PROPERTIES ( + "replication_num" = "1" + ); + + -- 在末尾增加一个子列 + ALTER TABLE struct_table MODIFY COLUMN struct_varchar STRUCT; + + -- 查看列类型 + DESC struct_table; + +----------------+-----------------------------------------+------+-------+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +----------------+-----------------------------------------+------+-------+---------+-------+ + | k | int | No | true | NULL | | + | struct_varchar | struct | Yes | false | NULL | NONE | + +----------------+-----------------------------------------+------+-------+---------+-------+ +``` \ No newline at end of file