Skip to content

Commit f960cf0

Browse files
authored
[fix](complex type) array map struct data type (#2615)
## Versions - [x] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built
1 parent dfc8a00 commit f960cf0

File tree

12 files changed

+1295
-830
lines changed

12 files changed

+1295
-830
lines changed

docs/data-operate/import/complex-types/array.md

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,7 @@
55
}
66
---
77

8-
`ARRAY<T>` An array of T-type items, it cannot be used as a key column.
9-
10-
- Before version 2.0, it was only supported in the Duplicate model table.
11-
- Starting from version 2.0, it is supported in the non-key columns of the Unique model table.
12-
13-
T-type could be any of:
14-
15-
```sql
16-
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE,
17-
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
18-
```
8+
`ARRAY<T>` An array of T-type items. Click [ARRAY](../../../sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md) to learn more.
199

2010
## CSV format import
2111

docs/data-operate/import/complex-types/map.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,7 @@
55
}
66
---
77

8-
`MAP<K, V>` A Map of K, V items, it cannot be used as a key column. Now MAP can only be used in Duplicate and Unique Model Tables.
9-
10-
K,V could be any of:
11-
12-
```sql
13-
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE,
14-
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
15-
```
8+
`MAP<K, V>` A Map of K, V items。 Click [MAP](../../../sql-manual/basic-element/sql-data-types/semi-structured/MAP.md) to learn more.
169

1710
## CSV format import
1811

docs/data-operate/import/complex-types/struct.md

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,7 @@
55
}
66
---
77

8-
`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns.
9-
10-
- It cannot be used as a Key column. Now STRUCT can only be used in Duplicate Model Tables.
11-
12-
- The names and number of Fields in a Struct are fixed and always Nullable, and a Field typically consists of the following parts.
13-
14-
- field_name: Identifier naming the field, non repeatable.
15-
- field_type: A data type.
16-
- COMMENT: An optional string describing the field. (currently not supported)
17-
18-
The currently supported types are:
19-
20-
```sql
21-
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE,
22-
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
23-
```
8+
`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns.Click [STRUCT](../../../sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md) to learn more.
249

2510
## CSV format import
2611

docs/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md

Lines changed: 248 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -5,65 +5,267 @@
55
}
66
---
77

8-
## ARRAY
8+
# ARRAY Documentation
99

10-
ARRAY
10+
## Type Description
1111

12-
### description
12+
The `ARRAY<T>` type is used to represent an ordered collection of elements, where each element has the same data type. For example, an array of integers can be represented as `[1, 2, 3]`, and an array of strings as `["a", "b", "c"]`.
1313

14-
`ARRAY<T>`
14+
- `ARRAY<T>` represents an array composed of elements of type T, where T is nullable. Supported types for T include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6, STRUCT, MAP, VARIANT, JSONB, ARRAY<T>`.
15+
- Note: Among the above T types, `JSONB` and `VARIANT` are only supported in the computation layer of Doris and **do not support using `ARRAY<JSONB>` and `ARRAY<VARIANT>` in table creation in Doris**.
1516

16-
An array of T-type items, it cannot be used as a key column. Now ARRAY can only used in Duplicate Model Tables.
17+
## Type Constraints
1718

18-
After version 2.0, it supports the use of non-key columns in Unique model tables.
19+
- The maximum nesting depth supported by `ARRAY<T>` type is 9.
20+
- Conversion between `ARRAY<T>` types depends on whether T can be converted. `Array<T>` type cannot be converted to other types.
21+
- For example: `ARRAY<INT>` can be converted to `ARRAY<BIGINT>` because `INT` and `BIGINT` can be converted.
22+
- `Variant` type can be converted to `Array<T>` type.
23+
- String type can be converted to `ARRAY<T>` type (through parsing, returning NULL if parsing fails).
24+
- In the `AGGREGATE` table model, `ARRAY<T>` type only supports `REPLACE` and `REPLACE_IF_NOT_NULL`. **In any table model, it cannot be used as a KEY column, nor as a partition or bucket column**.
25+
- Columns of `ARRAY<T>` type **support `ORDER BY` and `GROUP BY` operations**.
26+
- T types that support `ORDER BY` and `GROUP BY` include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`.
27+
- Columns of `ARRAY<T>` type do not support being used as `JOIN KEY` and do not support being used in `DELETE` statements.
1928

20-
T-type could be any of:
29+
## Constant Construction
2130

22-
```
23-
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE,
24-
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
25-
```
31+
- Use the `ARRAY()` function to construct a value of type `ARRAY<T>`, where T is the common type of the parameters.
32+
33+
```SQL
34+
-- [1, 2, 3] T is INT
35+
SELECT ARRAY(1, 2, 3);
2636

27-
### example
37+
-- ["1", "2", "abc"] , T is STRING
38+
SELECT ARRAY(1, 2, 'abc');
39+
```
40+
- Use `[]` to construct a value of type `ARRAY<T>`, where T is the common type of the parameters.
41+
42+
```SQL
43+
-- ["abc", "def", "efg"] T is STRING
44+
SELECT ["abc", "def", "efg"];
2845
29-
Create table example:
46+
-- ["1", "2", "abc"] , T is STRING
47+
SELECT [1, 2, 'abc'];
48+
```
3049

31-
```
32-
mysql> CREATE TABLE `array_test` (
33-
`id` int(11) NULL COMMENT "",
34-
`c_array` ARRAY<int(11)> NULL COMMENT ""
35-
) ENGINE=OLAP
36-
DUPLICATE KEY(`id`)
37-
COMMENT "OLAP"
38-
DISTRIBUTED BY HASH(`id`) BUCKETS 1
39-
PROPERTIES (
40-
"replication_allocation" = "tag.location.default: 1",
41-
"in_memory" = "false",
42-
"storage_format" = "V2"
43-
);
44-
```
50+
## Modifying Type
4551

46-
Insert data example:
52+
- Modification is only allowed when the element type inside `ARRAY` is `VARCHAR`.
53+
- Only allows changing the parameter of `VARCHAR` from smaller to larger, not the other way around.
4754

48-
```
49-
mysql> INSERT INTO `array_test` VALUES (1, [1,2,3,4,5]);
50-
mysql> INSERT INTO `array_test` VALUES (2, [6,7,8]), (3, []), (4, null);
51-
```
55+
```SQL
56+
CREATE TABLE `array_table` (
57+
`k` INT NOT NULL,
58+
`array_column` ARRAY<VARCHAR(10)>
59+
) ENGINE=OLAP
60+
DUPLICATE KEY(`k`)
61+
DISTRIBUTED BY HASH(`k`) BUCKETS 1
62+
PROPERTIES (
63+
"replication_num" = "1"
64+
);
5265
53-
Select data example:
66+
ALTER TABLE array_table MODIFY COLUMN array_column ARRAY<VARCHAR(20)>;
67+
```
68+
- The default value for columns of type `ARRAY<T>` can only be specified as NULL, and once specified, it cannot be modified.
5469

55-
```
56-
mysql> SELECT * FROM `array_test`;
57-
+------+-----------------+
58-
| id | c_array |
59-
+------+-----------------+
60-
| 1 | [1, 2, 3, 4, 5] |
61-
| 2 | [6, 7, 8] |
62-
| 3 | [] |
63-
| 4 | NULL |
64-
+------+-----------------+
65-
```
70+
## Element Access
6671

67-
### keywords
72+
- Use `[k]` to access the k-th element of `ARRAY<T>`, where k starts from 1. If out of bounds, returns NULL.
6873

69-
ARRAY
74+
```SQL
75+
SELECT [1, 2, 3][1];
76+
+--------------+
77+
| [1, 2, 3][1] |
78+
+--------------+
79+
| 1 |
80+
+--------------+
81+
82+
SELECT ARRAY(1, 2, 3)[2];
83+
+-------------------+
84+
| ARRAY(1, 2, 3)[2] |
85+
+-------------------+
86+
| 2 |
87+
+-------------------+
88+
89+
SELECT [[1,2,3],[2,3,4]][1][3];
90+
+-------------------------+
91+
| [[1,2,3],[2,3,4]][1][3] |
92+
+-------------------------+
93+
| 3 |
94+
+-------------------------+
95+
```
96+
97+
- Use `ELEMENT_AT(ARRAY, k)` to access the k-th element of `ARRAY<T>`, where k starts from 1. If out of bounds, returns NULL.
98+
99+
```SQL
100+
SELECT ELEMENT_AT(ARRAY(1, 2, 3) , 2);
101+
+--------------------------------+
102+
| ELEMENT_AT(ARRAY(1, 2, 3) , 2) |
103+
+--------------------------------+
104+
| 2 |
105+
+--------------------------------+
106+
107+
SELECT ELEMENT_AT([1, 2, 3] , 3);
108+
+---------------------------+
109+
| ELEMENT_AT([1, 2, 3] , 3) |
110+
+---------------------------+
111+
| 3 |
112+
+---------------------------+
113+
114+
SELECT ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3);
115+
+-------------------------------------------------------+
116+
| ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3) |
117+
+-------------------------------------------------------+
118+
| ["3"] |
119+
+-------------------------------------------------------+
120+
```
121+
122+
## Query Acceleration
123+
124+
- Columns of type `ARRAY<T>` in Doris tables support adding inverted indexes to accelerate computations involving `ARRAY` functions on this column.
125+
- T types supported by inverted indexes: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`.
126+
- Accelerated `ARRAY` functions: `ARRAY_CONTAINS`, `ARRAYS_OVERLAP`, but when the function parameters include NULL, it falls back to regular vectorized computation.
127+
128+
## Examples
129+
130+
- Multidimensional Arrays
131+
132+
```SQL
133+
-- Create table
134+
CREATE TABLE IF NOT EXISTS array_table (
135+
id INT,
136+
two_dim_array ARRAY<ARRAY<INT>>,
137+
three_dim_array ARRAY<ARRAY<ARRAY<STRING>>>
138+
) ENGINE=OLAP
139+
DUPLICATE KEY(id)
140+
DISTRIBUTED BY HASH(id) BUCKETS 1
141+
PROPERTIES (
142+
"replication_num" = "1"
143+
);
144+
145+
-- Insert
146+
INSERT INTO array_table VALUES (1, [[1, 2, 3], [4, 5, 6]], [[['ab', 'cd', 'ef'], ['gh', 'ij', 'kl']], [['mn', 'op', 'qr'], ['st', 'uv', 'wx']]]);
147+
148+
INSERT INTO array_table VALUES (2, ARRAY(ARRAY(1, 2, 3), ARRAY(4, 5, 6)), ARRAY(ARRAY(ARRAY('ab', 'cd', 'ef'), ARRAY('gh', 'ij', 'kl')), ARRAY(ARRAY('mn', 'op', 'qr'), ARRAY('st', 'uv', 'wx'))));
149+
150+
-- Query
151+
SELECT two_dim_array[1][2], three_dim_array[1][1][2] FROM array_table ORDER BY id;
152+
+---------------------+--------------------------+
153+
| two_dim_array[1][2] | three_dim_array[1][1][2] |
154+
+---------------------+--------------------------+
155+
| 2 | cd |
156+
| 2 | cd |
157+
+---------------------+--------------------------+
158+
```
159+
160+
- Nested Complex Types
161+
162+
```SQL
163+
-- Create table
164+
CREATE TABLE IF NOT EXISTS array_map_table (
165+
id INT,
166+
array_map ARRAY<MAP<STRING, INT>>
167+
) ENGINE=OLAP
168+
DUPLICATE KEY(id)
169+
DISTRIBUTED BY HASH(id) BUCKETS 1
170+
PROPERTIES (
171+
"replication_num" = "1"
172+
);
173+
174+
-- Insert
175+
INSERT INTO array_map_table VALUES (1, ARRAY(MAP('key1', 1), MAP('key2', 2)));
176+
INSERT INTO array_map_table VALUES (2, ARRAY(MAP('key1', 1), MAP('key2', 2)));
177+
178+
-- Query
179+
SELECT array_map[1], array_map[2] FROM array_map_table ORDER BY id;
180+
+--------------+--------------+
181+
| array_map[1] | array_map[2] |
182+
+--------------+--------------+
183+
| {"key1":1} | {"key2":2} |
184+
| {"key1":1} | {"key2":2} |
185+
+--------------+--------------+
186+
187+
-- Create table
188+
CREATE TABLE IF NOT EXISTS array_table (
189+
id INT,
190+
array_struct ARRAY<STRUCT<id: INT, name: STRING>>,
191+
) ENGINE=OLAP
192+
DUPLICATE KEY(id)
193+
DISTRIBUTED BY HASH(id) BUCKETS 1
194+
PROPERTIES (
195+
"replication_num" = "1"
196+
);
197+
198+
INSERT INTO array_table VALUES (1, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane')));
199+
INSERT INTO array_table VALUES (2, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane')));
200+
201+
SELECT array_struct[1], array_struct[2] FROM array_table ORDER BY id;
202+
+-------------------------+-------------------------+
203+
| array_struct[1] | array_struct[2] |
204+
+-------------------------+-------------------------+
205+
| {"id":1, "name":"John"} | {"id":2, "name":"Jane"} |
206+
| {"id":1, "name":"John"} | {"id":2, "name":"Jane"} |
207+
+-------------------------+-------------------------+
208+
```
209+
210+
- Modifying Type
211+
212+
```SQL
213+
-- Create table
214+
CREATE TABLE array_table (
215+
id INT,
216+
array_varchar ARRAY<VARCHAR(10)>
217+
) ENGINE=OLAP
218+
DUPLICATE KEY(id)
219+
DISTRIBUTED BY HASH(id) BUCKETS 1
220+
PROPERTIES (
221+
"replication_allocation" = "tag.location.default: 1"
222+
);
223+
224+
-- Modify ARRAY type
225+
ALTER TABLE array_table MODIFY COLUMN array_varchar ARRAY<VARCHAR(20)>;
226+
227+
-- Check column type
228+
DESC array_table;
229+
+---------------+--------------------+------+-------+---------+-------+
230+
| Field | Type | Null | Key | Default | Extra |
231+
+---------------+--------------------+------+-------+---------+-------+
232+
| id | int | Yes | true | NULL | |
233+
| array_varchar | array<varchar(20)> | Yes | false | NULL | NONE |
234+
+---------------+--------------------+------+-------+---------+-------+
235+
```
236+
237+
- Inverted Index
238+
239+
```SQL
240+
-- Create table statement
241+
CREATE TABLE `array_table` (
242+
`k` int NOT NULL,
243+
`array_column` ARRAY<INT>,
244+
INDEX idx_array_column (array_column) USING INVERTED
245+
) ENGINE=OLAP
246+
DUPLICATE KEY(`k`)
247+
DISTRIBUTED BY HASH(`k`) BUCKETS 1
248+
PROPERTIES (
249+
"replication_num" = "1"
250+
);
251+
252+
-- Insert
253+
INSERT INTO array_table VALUES (1, [1, 2, 3]), (2, [4, 5, 6]), (3, [7, 8, 9]);
254+
255+
-- The inverted index accelerates the execution of the ARRAY_CONTAINS function
256+
SELECT * FROM array_table WHERE ARRAY_CONTAINS(array_column, 5);
257+
+------+--------------+
258+
| k | array_column |
259+
+------+--------------+
260+
| 2 | [4, 5, 6] |
261+
+------+--------------+
262+
263+
-- The inverted index accelerates the execution of the ARRAYS_OVERLAP function
264+
SELECT * FROM array_table WHERE ARRAYS_OVERLAP(array_column, [6, 9]);
265+
+------+--------------+
266+
| k | array_column |
267+
+------+--------------+
268+
| 2 | [4, 5, 6] |
269+
| 3 | [7, 8, 9] |
270+
+------+--------------+
271+
```

0 commit comments

Comments
 (0)