Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 1 addition & 11 deletions docs/data-operate/import/complex-types/array.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,7 @@
}
---

`ARRAY<T>` An array of T-type items, it cannot be used as a key column.

- Before version 2.0, it was only supported in the Duplicate model table.
- Starting from version 2.0, it is supported in the non-key columns of the Unique model table.

T-type could be any of:

```sql
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE,
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
```
`ARRAY<T>` An array of T-type items. Click [ARRAY](../../../sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md) to learn more.

## CSV format import

Expand Down
9 changes: 1 addition & 8 deletions docs/data-operate/import/complex-types/map.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,7 @@
}
---

`MAP<K, V>` A Map of K, V items, it cannot be used as a key column. Now MAP can only be used in Duplicate and Unique Model Tables.

K,V could be any of:

```sql
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE,
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
```
`MAP<K, V>` A Map of K, V items。 Click [MAP](../../../sql-manual/basic-element/sql-data-types/semi-structured/MAP.md) to learn more.

## CSV format import

Expand Down
17 changes: 1 addition & 16 deletions docs/data-operate/import/complex-types/struct.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,7 @@
}
---

`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns.

- It cannot be used as a Key column. Now STRUCT can only be used in Duplicate Model Tables.

- The names and number of Fields in a Struct are fixed and always Nullable, and a Field typically consists of the following parts.

- field_name: Identifier naming the field, non repeatable.
- field_type: A data type.
- COMMENT: An optional string describing the field. (currently not supported)

The currently supported types are:

```sql
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DECIMALV3, DATE,
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
```
`STRUCT<field_name:field_type [COMMENT 'comment_string'], ... >` Represents value with structure described by multiple fields, which can be viewed as a collection of multiple columns.Click [STRUCT](../../../sql-manual/basic-element/sql-data-types/semi-structured/STRUCT.md) to learn more.

## CSV format import

Expand Down
294 changes: 248 additions & 46 deletions docs/sql-manual/basic-element/sql-data-types/semi-structured/ARRAY.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,65 +5,267 @@
}
---

## ARRAY
# ARRAY Documentation

ARRAY
## Type Description

### description
The `ARRAY<T>` type is used to represent an ordered collection of elements, where each element has the same data type. For example, an array of integers can be represented as `[1, 2, 3]`, and an array of strings as `["a", "b", "c"]`.

`ARRAY<T>`
- `ARRAY<T>` represents an array composed of elements of type T, where T is nullable. Supported types for T include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6, STRUCT, MAP, VARIANT, JSONB, ARRAY<T>`.
- Note: Among the above T types, `JSONB` and `VARIANT` are only supported in the computation layer of Doris and **do not support using `ARRAY<JSONB>` and `ARRAY<VARIANT>` in table creation in Doris**.

An array of T-type items, it cannot be used as a key column. Now ARRAY can only used in Duplicate Model Tables.
## Type Constraints

After version 2.0, it supports the use of non-key columns in Unique model tables.
- The maximum nesting depth supported by `ARRAY<T>` type is 9.
- Conversion between `ARRAY<T>` types depends on whether T can be converted. `Array<T>` type cannot be converted to other types.
- For example: `ARRAY<INT>` can be converted to `ARRAY<BIGINT>` because `INT` and `BIGINT` can be converted.
- `Variant` type can be converted to `Array<T>` type.
- String type can be converted to `ARRAY<T>` type (through parsing, returning NULL if parsing fails).
- In the `AGGREGATE` table model, `ARRAY<T>` type only supports `REPLACE` and `REPLACE_IF_NOT_NULL`. **In any table model, it cannot be used as a KEY column, nor as a partition or bucket column**.
- Columns of `ARRAY<T>` type **support `ORDER BY` and `GROUP BY` operations**.
- T types that support `ORDER BY` and `GROUP BY` include: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`.
- Columns of `ARRAY<T>` type do not support being used as `JOIN KEY` and do not support being used in `DELETE` statements.

T-type could be any of:
## Constant Construction

```
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE,
DATEV2, DATETIME, DATETIMEV2, CHAR, VARCHAR, STRING
```
- Use the `ARRAY()` function to construct a value of type `ARRAY<T>`, where T is the common type of the parameters.

```SQL
-- [1, 2, 3] T is INT
SELECT ARRAY(1, 2, 3);

### example
-- ["1", "2", "abc"] , T is STRING
SELECT ARRAY(1, 2, 'abc');
```
- Use `[]` to construct a value of type `ARRAY<T>`, where T is the common type of the parameters.

```SQL
-- ["abc", "def", "efg"] T is STRING
SELECT ["abc", "def", "efg"];

Create table example:
-- ["1", "2", "abc"] , T is STRING
SELECT [1, 2, 'abc'];
```

```
mysql> CREATE TABLE `array_test` (
`id` int(11) NULL COMMENT "",
`c_array` ARRAY<int(11)> NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);
```
## Modifying Type

Insert data example:
- Modification is only allowed when the element type inside `ARRAY` is `VARCHAR`.
- Only allows changing the parameter of `VARCHAR` from smaller to larger, not the other way around.

```
mysql> INSERT INTO `array_test` VALUES (1, [1,2,3,4,5]);
mysql> INSERT INTO `array_test` VALUES (2, [6,7,8]), (3, []), (4, null);
```
```SQL
CREATE TABLE `array_table` (
`k` INT NOT NULL,
`array_column` ARRAY<VARCHAR(10)>
) ENGINE=OLAP
DUPLICATE KEY(`k`)
DISTRIBUTED BY HASH(`k`) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

Select data example:
ALTER TABLE array_table MODIFY COLUMN array_column ARRAY<VARCHAR(20)>;
```
- The default value for columns of type `ARRAY<T>` can only be specified as NULL, and once specified, it cannot be modified.

```
mysql> SELECT * FROM `array_test`;
+------+-----------------+
| id | c_array |
+------+-----------------+
| 1 | [1, 2, 3, 4, 5] |
| 2 | [6, 7, 8] |
| 3 | [] |
| 4 | NULL |
+------+-----------------+
```
## Element Access

### keywords
- Use `[k]` to access the k-th element of `ARRAY<T>`, where k starts from 1. If out of bounds, returns NULL.

ARRAY
```SQL
SELECT [1, 2, 3][1];
+--------------+
| [1, 2, 3][1] |
+--------------+
| 1 |
+--------------+

SELECT ARRAY(1, 2, 3)[2];
+-------------------+
| ARRAY(1, 2, 3)[2] |
+-------------------+
| 2 |
+-------------------+

SELECT [[1,2,3],[2,3,4]][1][3];
+-------------------------+
| [[1,2,3],[2,3,4]][1][3] |
+-------------------------+
| 3 |
+-------------------------+
```

- Use `ELEMENT_AT(ARRAY, k)` to access the k-th element of `ARRAY<T>`, where k starts from 1. If out of bounds, returns NULL.

```SQL
SELECT ELEMENT_AT(ARRAY(1, 2, 3) , 2);
+--------------------------------+
| ELEMENT_AT(ARRAY(1, 2, 3) , 2) |
+--------------------------------+
| 2 |
+--------------------------------+

SELECT ELEMENT_AT([1, 2, 3] , 3);
+---------------------------+
| ELEMENT_AT([1, 2, 3] , 3) |
+---------------------------+
| 3 |
+---------------------------+

SELECT ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3);
+-------------------------------------------------------+
| ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3) |
+-------------------------------------------------------+
| ["3"] |
+-------------------------------------------------------+
```

## Query Acceleration

- Columns of type `ARRAY<T>` in Doris tables support adding inverted indexes to accelerate computations involving `ARRAY` functions on this column.
- T types supported by inverted indexes: `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6`.
- Accelerated `ARRAY` functions: `ARRAY_CONTAINS`, `ARRAYS_OVERLAP`, but when the function parameters include NULL, it falls back to regular vectorized computation.

## Examples

- Multidimensional Arrays

```SQL
-- Create table
CREATE TABLE IF NOT EXISTS array_table (
id INT,
two_dim_array ARRAY<ARRAY<INT>>,
three_dim_array ARRAY<ARRAY<ARRAY<STRING>>>
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

-- Insert
INSERT INTO array_table VALUES (1, [[1, 2, 3], [4, 5, 6]], [[['ab', 'cd', 'ef'], ['gh', 'ij', 'kl']], [['mn', 'op', 'qr'], ['st', 'uv', 'wx']]]);

INSERT INTO array_table VALUES (2, ARRAY(ARRAY(1, 2, 3), ARRAY(4, 5, 6)), ARRAY(ARRAY(ARRAY('ab', 'cd', 'ef'), ARRAY('gh', 'ij', 'kl')), ARRAY(ARRAY('mn', 'op', 'qr'), ARRAY('st', 'uv', 'wx'))));

-- Query
SELECT two_dim_array[1][2], three_dim_array[1][1][2] FROM array_table ORDER BY id;
+---------------------+--------------------------+
| two_dim_array[1][2] | three_dim_array[1][1][2] |
+---------------------+--------------------------+
| 2 | cd |
| 2 | cd |
+---------------------+--------------------------+
```

- Nested Complex Types

```SQL
-- Create table
CREATE TABLE IF NOT EXISTS array_map_table (
id INT,
array_map ARRAY<MAP<STRING, INT>>
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

-- Insert
INSERT INTO array_map_table VALUES (1, ARRAY(MAP('key1', 1), MAP('key2', 2)));
INSERT INTO array_map_table VALUES (2, ARRAY(MAP('key1', 1), MAP('key2', 2)));

-- Query
SELECT array_map[1], array_map[2] FROM array_map_table ORDER BY id;
+--------------+--------------+
| array_map[1] | array_map[2] |
+--------------+--------------+
| {"key1":1} | {"key2":2} |
| {"key1":1} | {"key2":2} |
+--------------+--------------+

-- Create table
CREATE TABLE IF NOT EXISTS array_table (
id INT,
array_struct ARRAY<STRUCT<id: INT, name: STRING>>,
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

INSERT INTO array_table VALUES (1, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane')));
INSERT INTO array_table VALUES (2, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane')));

SELECT array_struct[1], array_struct[2] FROM array_table ORDER BY id;
+-------------------------+-------------------------+
| array_struct[1] | array_struct[2] |
+-------------------------+-------------------------+
| {"id":1, "name":"John"} | {"id":2, "name":"Jane"} |
| {"id":1, "name":"John"} | {"id":2, "name":"Jane"} |
+-------------------------+-------------------------+
```

- Modifying Type

```SQL
-- Create table
CREATE TABLE array_table (
id INT,
array_varchar ARRAY<VARCHAR(10)>
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

-- Modify ARRAY type
ALTER TABLE array_table MODIFY COLUMN array_varchar ARRAY<VARCHAR(20)>;

-- Check column type
DESC array_table;
+---------------+--------------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------------+------+-------+---------+-------+
| id | int | Yes | true | NULL | |
| array_varchar | array<varchar(20)> | Yes | false | NULL | NONE |
+---------------+--------------------+------+-------+---------+-------+
```

- Inverted Index

```SQL
-- Create table statement
CREATE TABLE `array_table` (
`k` int NOT NULL,
`array_column` ARRAY<INT>,
INDEX idx_array_column (array_column) USING INVERTED
) ENGINE=OLAP
DUPLICATE KEY(`k`)
DISTRIBUTED BY HASH(`k`) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

-- Insert
INSERT INTO array_table VALUES (1, [1, 2, 3]), (2, [4, 5, 6]), (3, [7, 8, 9]);

-- The inverted index accelerates the execution of the ARRAY_CONTAINS function
SELECT * FROM array_table WHERE ARRAY_CONTAINS(array_column, 5);
+------+--------------+
| k | array_column |
+------+--------------+
| 2 | [4, 5, 6] |
+------+--------------+

-- The inverted index accelerates the execution of the ARRAYS_OVERLAP function
SELECT * FROM array_table WHERE ARRAYS_OVERLAP(array_column, [6, 9]);
+------+--------------+
| k | array_column |
+------+--------------+
| 2 | [4, 5, 6] |
| 3 | [7, 8, 9] |
+------+--------------+
```
Loading
Loading