|
5 | 5 | } |
6 | 6 | --- |
7 | 7 |
|
8 | | -## ARRAY |
| 8 | +# ARRAY Documentation |
9 | 9 |
|
10 | | -ARRAY |
| 10 | +## Type Description |
11 | 11 |
|
12 | | -### description |
| 12 | +The `ARRAY<T>` type is used to represent an ordered collection of elements, where each element has the same data type. For example, an array of integers can be represented as `[1, 2, 3]`, and an array of strings as `["a", "b", "c"]`. |
13 | 13 |
|
14 | | -`ARRAY<T>` |
| 14 | +- `ARRAY<T>` represents an array composed of elements of type T, where T is nullable. Supported types for T include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6, STRUCT, MAP, VARIANT, JSONB, ARRAY<T>`. |
| 15 | + - Note: Among the above T types, `JSONB` and `VARIANT` are only supported in the computation layer of Doris and **do not support using `ARRAY<JSONB>` and `ARRAY<VARIANT>` in table creation in Doris**. |
15 | 16 |
|
16 | | -An array of T-type items, it cannot be used as a key column. Now ARRAY can only used in Duplicate Model Tables. |
| 17 | +## Type Constraints |
17 | 18 |
|
18 | | -After version 2.0, it supports the use of non-key columns in Unique model tables. |
| 19 | +- The maximum nesting depth supported by `ARRAY<T>` type is 9. |
| 20 | +- Conversion between `ARRAY<T>` types depends on whether T can be converted. `Array<T>` type cannot be converted to other types. |
| 21 | + - For example: `ARRAY<INT>` can be converted to `ARRAY<BIGINT>` because `INT` and `BIGINT` can be converted. |
| 22 | + - `Variant` type can be converted to `Array<T>` type. |
| 23 | + - String type can be converted to `ARRAY<T>` type (through parsing, returning NULL if parsing fails). |
| 24 | +- In the `AGGREGATE` table model, `ARRAY<T>` type only supports `REPLACE` and `REPLACE_IF_NOT_NULL`. **In any table model, it cannot be used as a KEY column, nor as a partition or bucket column**. |
| 25 | +- Columns of `ARRAY<T>` type **support `ORDER BY` and `GROUP BY` operations**. |
| 26 | + - T types that support `ORDER BY` and `GROUP BY` include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`. |
| 27 | +- Columns of `ARRAY<T>` type do not support being used as `JOIN KEY` and do not support being used in `DELETE` statements. |
19 | 28 |
|
20 | | -T-type could be any of: |
| 29 | +## Constant Construction |
21 | 30 |
|
22 | | -``` |
23 | | -BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, |
24 | | -DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING |
25 | | -``` |
| 31 | +- Use the `ARRAY()` function to construct a value of type `ARRAY<T>`, where T is the common type of the parameters. |
| 32 | + |
| 33 | + ```SQL |
| 34 | + -- [1, 2, 3] T is INT |
| 35 | + SELECT ARRAY(1, 2, 3); |
26 | 36 |
|
27 | | -### example |
| 37 | + -- ["1", "2", "abc"] , T is STRING |
| 38 | + SELECT ARRAY(1, 2, 'abc'); |
| 39 | + ``` |
| 40 | +- Use `[]` to construct a value of type `ARRAY<T>`, where T is the common type of the parameters. |
| 41 | + |
| 42 | + ```SQL |
| 43 | + -- ["abc", "def", "efg"] T is STRING |
| 44 | + SELECT ["abc", "def", "efg"]; |
28 | 45 |
|
29 | | -Create table example: |
| 46 | + -- ["1", "2", "abc"] , T is STRING |
| 47 | + SELECT [1, 2, 'abc']; |
| 48 | + ``` |
30 | 49 |
|
31 | | -``` |
32 | | -mysql> CREATE TABLE `array_test` ( |
33 | | - `id` int(11) NULL COMMENT "", |
34 | | - `c_array` ARRAY<int(11)> NULL COMMENT "" |
35 | | -) ENGINE=OLAP |
36 | | -DUPLICATE KEY(`id`) |
37 | | -COMMENT "OLAP" |
38 | | -DISTRIBUTED BY HASH(`id`) BUCKETS 1 |
39 | | -PROPERTIES ( |
40 | | -"replication_allocation" = "tag.location.default: 1", |
41 | | -"in_memory" = "false", |
42 | | -"storage_format" = "V2" |
43 | | -); |
44 | | -``` |
| 50 | +## Modifying Type |
45 | 51 |
|
46 | | -Insert data example: |
| 52 | +- Modification is only allowed when the element type inside `ARRAY` is `VARCHAR`. |
| 53 | + - Only allows changing the parameter of `VARCHAR` from smaller to larger, not the other way around. |
47 | 54 |
|
48 | | -``` |
49 | | -mysql> INSERT INTO `array_test` VALUES (1, [1,2,3,4,5]); |
50 | | -mysql> INSERT INTO `array_test` VALUES (2, [6,7,8]), (3, []), (4, null); |
51 | | -``` |
| 55 | + ```SQL |
| 56 | + CREATE TABLE `array_table` ( |
| 57 | + `k` INT NOT NULL, |
| 58 | + `array_column` ARRAY<VARCHAR(10)> |
| 59 | + ) ENGINE=OLAP |
| 60 | + DUPLICATE KEY(`k`) |
| 61 | + DISTRIBUTED BY HASH(`k`) BUCKETS 1 |
| 62 | + PROPERTIES ( |
| 63 | + "replication_num" = "1" |
| 64 | + ); |
52 | 65 |
|
53 | | -Select data example: |
| 66 | + ALTER TABLE array_table MODIFY COLUMN array_column ARRAY<VARCHAR(20)>; |
| 67 | + ``` |
| 68 | +- The default value for columns of type `ARRAY<T>` can only be specified as NULL, and once specified, it cannot be modified. |
54 | 69 |
|
55 | | -``` |
56 | | -mysql> SELECT * FROM `array_test`; |
57 | | -+------+-----------------+ |
58 | | -| id | c_array | |
59 | | -+------+-----------------+ |
60 | | -| 1 | [1, 2, 3, 4, 5] | |
61 | | -| 2 | [6, 7, 8] | |
62 | | -| 3 | [] | |
63 | | -| 4 | NULL | |
64 | | -+------+-----------------+ |
65 | | -``` |
| 70 | +## Element Access |
66 | 71 |
|
67 | | -### keywords |
| 72 | +- Use `[k]` to access the k-th element of `ARRAY<T>`, where k starts from 1. If out of bounds, returns NULL. |
68 | 73 |
|
69 | | - ARRAY |
| 74 | + ```SQL |
| 75 | + SELECT [1, 2, 3][1]; |
| 76 | + +--------------+ |
| 77 | + | [1, 2, 3][1] | |
| 78 | + +--------------+ |
| 79 | + | 1 | |
| 80 | + +--------------+ |
| 81 | +
|
| 82 | + SELECT ARRAY(1, 2, 3)[2]; |
| 83 | + +-------------------+ |
| 84 | + | ARRAY(1, 2, 3)[2] | |
| 85 | + +-------------------+ |
| 86 | + | 2 | |
| 87 | + +-------------------+ |
| 88 | +
|
| 89 | + SELECT [[1,2,3],[2,3,4]][1][3]; |
| 90 | + +-------------------------+ |
| 91 | + | [[1,2,3],[2,3,4]][1][3] | |
| 92 | + +-------------------------+ |
| 93 | + | 3 | |
| 94 | + +-------------------------+ |
| 95 | + ``` |
| 96 | + |
| 97 | +- Use `ELEMENT_AT(ARRAY, k)` to access the k-th element of `ARRAY<T>`, where k starts from 1. If out of bounds, returns NULL. |
| 98 | + |
| 99 | + ```SQL |
| 100 | + SELECT ELEMENT_AT(ARRAY(1, 2, 3) , 2); |
| 101 | + +--------------------------------+ |
| 102 | + | ELEMENT_AT(ARRAY(1, 2, 3) , 2) | |
| 103 | + +--------------------------------+ |
| 104 | + | 2 | |
| 105 | + +--------------------------------+ |
| 106 | +
|
| 107 | + SELECT ELEMENT_AT([1, 2, 3] , 3); |
| 108 | + +---------------------------+ |
| 109 | + | ELEMENT_AT([1, 2, 3] , 3) | |
| 110 | + +---------------------------+ |
| 111 | + | 3 | |
| 112 | + +---------------------------+ |
| 113 | +
|
| 114 | + SELECT ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3); |
| 115 | + +-------------------------------------------------------+ |
| 116 | + | ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3) | |
| 117 | + +-------------------------------------------------------+ |
| 118 | + | ["3"] | |
| 119 | + +-------------------------------------------------------+ |
| 120 | + ``` |
| 121 | + |
| 122 | +## Query Acceleration |
| 123 | + |
| 124 | +- Columns of type `ARRAY<T>` in Doris tables support adding inverted indexes to accelerate computations involving `ARRAY` functions on this column. |
| 125 | + - T types supported by inverted indexes: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`. |
| 126 | + - Accelerated `ARRAY` functions: `ARRAY_CONTAINS`, `ARRAYS_OVERLAP`, but when the function parameters include NULL, it falls back to regular vectorized computation. |
| 127 | + |
| 128 | +## Examples |
| 129 | + |
| 130 | +- Multidimensional Arrays |
| 131 | + |
| 132 | + ```SQL |
| 133 | + -- Create table |
| 134 | + CREATE TABLE IF NOT EXISTS array_table ( |
| 135 | + id INT, |
| 136 | + two_dim_array ARRAY<ARRAY<INT>>, |
| 137 | + three_dim_array ARRAY<ARRAY<ARRAY<STRING>>> |
| 138 | + ) ENGINE=OLAP |
| 139 | + DUPLICATE KEY(id) |
| 140 | + DISTRIBUTED BY HASH(id) BUCKETS 1 |
| 141 | + PROPERTIES ( |
| 142 | + "replication_num" = "1" |
| 143 | + ); |
| 144 | +
|
| 145 | + -- Insert |
| 146 | + INSERT INTO array_table VALUES (1, [[1, 2, 3], [4, 5, 6]], [[['ab', 'cd', 'ef'], ['gh', 'ij', 'kl']], [['mn', 'op', 'qr'], ['st', 'uv', 'wx']]]); |
| 147 | +
|
| 148 | + INSERT INTO array_table VALUES (2, ARRAY(ARRAY(1, 2, 3), ARRAY(4, 5, 6)), ARRAY(ARRAY(ARRAY('ab', 'cd', 'ef'), ARRAY('gh', 'ij', 'kl')), ARRAY(ARRAY('mn', 'op', 'qr'), ARRAY('st', 'uv', 'wx')))); |
| 149 | +
|
| 150 | + -- Query |
| 151 | + SELECT two_dim_array[1][2], three_dim_array[1][1][2] FROM array_table ORDER BY id; |
| 152 | + +---------------------+--------------------------+ |
| 153 | + | two_dim_array[1][2] | three_dim_array[1][1][2] | |
| 154 | + +---------------------+--------------------------+ |
| 155 | + | 2 | cd | |
| 156 | + | 2 | cd | |
| 157 | + +---------------------+--------------------------+ |
| 158 | + ``` |
| 159 | + |
| 160 | +- Nested Complex Types |
| 161 | + |
| 162 | + ```SQL |
| 163 | + -- Create table |
| 164 | + CREATE TABLE IF NOT EXISTS array_map_table ( |
| 165 | + id INT, |
| 166 | + array_map ARRAY<MAP<STRING, INT>> |
| 167 | + ) ENGINE=OLAP |
| 168 | + DUPLICATE KEY(id) |
| 169 | + DISTRIBUTED BY HASH(id) BUCKETS 1 |
| 170 | + PROPERTIES ( |
| 171 | + "replication_num" = "1" |
| 172 | + ); |
| 173 | +
|
| 174 | + -- Insert |
| 175 | + INSERT INTO array_map_table VALUES (1, ARRAY(MAP('key1', 1), MAP('key2', 2))); |
| 176 | + INSERT INTO array_map_table VALUES (2, ARRAY(MAP('key1', 1), MAP('key2', 2))); |
| 177 | +
|
| 178 | + -- Query |
| 179 | + SELECT array_map[1], array_map[2] FROM array_map_table ORDER BY id; |
| 180 | + +--------------+--------------+ |
| 181 | + | array_map[1] | array_map[2] | |
| 182 | + +--------------+--------------+ |
| 183 | + | {"key1":1} | {"key2":2} | |
| 184 | + | {"key1":1} | {"key2":2} | |
| 185 | + +--------------+--------------+ |
| 186 | +
|
| 187 | + -- Create table |
| 188 | + CREATE TABLE IF NOT EXISTS array_table ( |
| 189 | + id INT, |
| 190 | + array_struct ARRAY<STRUCT<id: INT, name: STRING>>, |
| 191 | + ) ENGINE=OLAP |
| 192 | + DUPLICATE KEY(id) |
| 193 | + DISTRIBUTED BY HASH(id) BUCKETS 1 |
| 194 | + PROPERTIES ( |
| 195 | + "replication_num" = "1" |
| 196 | + ); |
| 197 | +
|
| 198 | + INSERT INTO array_table VALUES (1, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane'))); |
| 199 | + INSERT INTO array_table VALUES (2, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane'))); |
| 200 | +
|
| 201 | + SELECT array_struct[1], array_struct[2] FROM array_table ORDER BY id; |
| 202 | + +-------------------------+-------------------------+ |
| 203 | + | array_struct[1] | array_struct[2] | |
| 204 | + +-------------------------+-------------------------+ |
| 205 | + | {"id":1, "name":"John"} | {"id":2, "name":"Jane"} | |
| 206 | + | {"id":1, "name":"John"} | {"id":2, "name":"Jane"} | |
| 207 | + +-------------------------+-------------------------+ |
| 208 | + ``` |
| 209 | + |
| 210 | +- Modifying Type |
| 211 | + |
| 212 | + ```SQL |
| 213 | + -- Create table |
| 214 | + CREATE TABLE array_table ( |
| 215 | + id INT, |
| 216 | + array_varchar ARRAY<VARCHAR(10)> |
| 217 | + ) ENGINE=OLAP |
| 218 | + DUPLICATE KEY(id) |
| 219 | + DISTRIBUTED BY HASH(id) BUCKETS 1 |
| 220 | + PROPERTIES ( |
| 221 | + "replication_allocation" = "tag.location.default: 1" |
| 222 | + ); |
| 223 | +
|
| 224 | + -- Modify ARRAY type |
| 225 | + ALTER TABLE array_table MODIFY COLUMN array_varchar ARRAY<VARCHAR(20)>; |
| 226 | +
|
| 227 | + -- Check column type |
| 228 | + DESC array_table; |
| 229 | + +---------------+--------------------+------+-------+---------+-------+ |
| 230 | + | Field | Type | Null | Key | Default | Extra | |
| 231 | + +---------------+--------------------+------+-------+---------+-------+ |
| 232 | + | id | int | Yes | true | NULL | | |
| 233 | + | array_varchar | array<varchar(20)> | Yes | false | NULL | NONE | |
| 234 | + +---------------+--------------------+------+-------+---------+-------+ |
| 235 | + ``` |
| 236 | + |
| 237 | +- Inverted Index |
| 238 | + |
| 239 | + ```SQL |
| 240 | + -- Create table statement |
| 241 | + CREATE TABLE `array_table` ( |
| 242 | + `k` int NOT NULL, |
| 243 | + `array_column` ARRAY<INT>, |
| 244 | + INDEX idx_array_column (array_column) USING INVERTED |
| 245 | + ) ENGINE=OLAP |
| 246 | + DUPLICATE KEY(`k`) |
| 247 | + DISTRIBUTED BY HASH(`k`) BUCKETS 1 |
| 248 | + PROPERTIES ( |
| 249 | + "replication_num" = "1" |
| 250 | + ); |
| 251 | +
|
| 252 | + -- Insert |
| 253 | + INSERT INTO array_table VALUES (1, [1, 2, 3]), (2, [4, 5, 6]), (3, [7, 8, 9]); |
| 254 | +
|
| 255 | + -- The inverted index accelerates the execution of the ARRAY_CONTAINS function |
| 256 | + SELECT * FROM array_table WHERE ARRAY_CONTAINS(array_column, 5); |
| 257 | + +------+--------------+ |
| 258 | + | k | array_column | |
| 259 | + +------+--------------+ |
| 260 | + | 2 | [4, 5, 6] | |
| 261 | + +------+--------------+ |
| 262 | +
|
| 263 | + -- The inverted index accelerates the execution of the ARRAYS_OVERLAP function |
| 264 | + SELECT * FROM array_table WHERE ARRAYS_OVERLAP(array_column, [6, 9]); |
| 265 | + +------+--------------+ |
| 266 | + | k | array_column | |
| 267 | + +------+--------------+ |
| 268 | + | 2 | [4, 5, 6] | |
| 269 | + | 3 | [7, 8, 9] | |
| 270 | + +------+--------------+ |
| 271 | + ``` |
0 commit comments