Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions docs/cn/guides/40-load-data/04-transform/04-querying-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ sidebar_label: 元数据

## 查询元数据详细指南

| 文件格式 | 指南 |
| ----------- | ------------------------------------------------------------------------------------ |
| Parquet | [使用元数据查询 Parquet 文件](./00-querying-parquet.md#query-with-metadata) |
| CSV | [使用元数据查询 CSV 文件](./01-querying-csv.md#query-with-metadata) |
| TSV | [使用元数据查询 TSV 文件](./02-querying-tsv.md#query-with-metadata) |
| NDJSON | [使用元数据查询 NDJSON 文件](./03-querying-ndjson.md#query-with-metadata) |
| ORC | [使用元数据查询 ORC 文件](./03-querying-orc.md#query-with-metadata) |
| Avro | [使用元数据查询 Avro 文件](./04-querying-avro.md#query-with-metadata) |
| 文件格式 | 指南 |
| ----------- |--------------------------------------------------------------------|
| Parquet | [使用元数据查询 Parquet 文件](./00-querying-parquet.md#query-with-metadata) |
| CSV | [使用元数据查询 CSV 文件](./01-querying-csv.md#query-with-metadata) |
| TSV | [使用元数据查询 TSV 文件](./02-querying-tsv.md#query-with-metadata) |
| NDJSON | [使用元数据查询 NDJSON 文件](./03-querying-ndjson.md#query-with-metadata) |
| ORC | [使用元数据查询 ORC 文件](./05-querying-orc.md#query-with-metadata) |
| Avro | [使用元数据查询 Avro 文件](./04-querying-avro.md#query-with-metadata) |
14 changes: 7 additions & 7 deletions docs/cn/guides/40-load-data/04-transform/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}

## 支持的文件格式

| 文件格式 | 返回格式 | 访问方法 | 示例 | 指南 |
| ----------- | ------------ | ------------- | ------- | ----- |
| 文件格式 | 返回格式 | 访问方法 | 示例 | 指南 |
| ----------- | ------------ | ------------- | ------- |-------------------------------------------|
| Parquet | 原生数据类型 | 直接列名 | `SELECT id, name FROM` | [查询 Parquet 文件](./00-querying-parquet.md) |
| ORC | 原生数据类型 | 直接列名 | `SELECT id, name FROM` | [查询 ORC 文件](./03-querying-orc.md) |
| CSV | 字符串值 | 位置引用 `$<position>` | `SELECT $1, $2 FROM` | [查询 CSV 文件](./01-querying-csv.md) |
| TSV | 字符串值 | 位置引用 `$<position>` | `SELECT $1, $2 FROM` | [查询 TSV 文件](./02-querying-tsv.md) |
| NDJSON | Variant 对象 | 路径表达式 `$1:<field>` | `SELECT $1:id, $1:name FROM` | [查询 NDJSON 文件](./03-querying-ndjson.md) |
| Avro | Variant 对象 | 路径表达式 `$1:<field>` | `SELECT $1:id, $1:name FROM` | [查询 Avro 文件](./04-querying-avro.md) |
| ORC | 原生数据类型 | 直接列名 | `SELECT id, name FROM` | [查询 ORC 文件](./05-querying-orc.md) |
| CSV | 字符串值 | 位置引用 `$<position>` | `SELECT $1, $2 FROM` | [查询 CSV 文件](./01-querying-csv.md) |
| TSV | 字符串值 | 位置引用 `$<position>` | `SELECT $1, $2 FROM` | [查询 TSV 文件](./02-querying-tsv.md) |
| NDJSON | Variant 对象 | 路径表达式 `$1:<field>` | `SELECT $1:id, $1:name FROM` | [查询 NDJSON 文件](./03-querying-ndjson.md) |
| Avro | Variant 对象 | 路径表达式 `$1:<field>` | `SELECT $1:id, $1:name FROM` | [查询 Avro 文件](./04-querying-avro.md) |
2 changes: 1 addition & 1 deletion docs/cn/guides/40-load-data/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Databend 强大的 ETL 能力支持从多种数据源和格式高效加载数据
<summary> ORC </summary>

- [将 ORC 数据导入表](./03-load-semistructured/04-load-orc.md)
- [直接查询 ORC 文件](./04-transform/03-querying-orc.md)
- [直接查询 ORC 文件](./04-transform/05-querying-orc.md)

</details>

Expand Down
50 changes: 22 additions & 28 deletions docs/en/guides/40-load-data/04-transform/00-querying-parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,12 @@ title: Querying Parquet Files in Stage
sidebar_label: Parquet
---

## Query Parquet Files in Stage

Syntax:
```sql
SELECT [<alias>.]<column> [, <column> ...]
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'PARQUET | <custom_format_name>'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ],
[ CASE_SENSITIVE => true | false ]
)]
```

:::info Tips
**Query Return Content Explanation:**
## Syntax:

* **Return Format**: Column values in their native data types (not variants)
* **Access Method**: Directly use column names `column_name`
* **Example**: `SELECT id, name, age FROM @stage_name`
* **Key Features**:
* No need for path expressions (like `$1:name`)
* No type casting required
* Parquet files contain embedded schema information
:::
- [Query rows as Variants](./index.md#query-rows-as-variants)
- [Query columns by name](./index.md#query-columns-by-name)
- [Query Metadata](./index.md#query-metadata)

## Tutorial

Expand All @@ -47,14 +27,14 @@ CONNECTION = (
### Step 2. Create Custom Parquet File Format

```sql
CREATE FILE FORMAT parquet_query_format
TYPE = PARQUET
;
CREATE FILE FORMAT parquet_query_format TYPE = PARQUET;
```
- More Parquet file format options refer to [Parquet File Format Options](/sql/sql-reference/file-format-options#parquet-options)

### Step 3. Query Parquet Files

query with colum names:

```sql
SELECT *
FROM @parquet_query_stage
Expand All @@ -63,6 +43,20 @@ FROM @parquet_query_stage
PATTERN => '.*[.]parquet'
);
```

query with path expressions:


```sql
SELECT $1
FROM @parquet_query_stage
(
FILE_FORMAT => 'parquet_query_format',
PATTERN => '.*[.]parquet'
);
```


### Query with Metadata

Query Parquet files directly from a stage, including metadata columns like `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER`:
Expand All @@ -77,4 +71,4 @@ FROM @parquet_query_stage
FILE_FORMAT => 'parquet_query_format',
PATTERN => '.*[.]parquet'
);
```
```
29 changes: 3 additions & 26 deletions docs/en/guides/40-load-data/04-transform/01-querying-csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,33 +3,10 @@ title: Querying CSV Files in Stage
sidebar_label: CSV
---

## Query CSV Files in Stage
## Syntax:

Syntax:
```sql
SELECT [<alias>.]$<col_position> [, $<col_position> ...]
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'CSV| <custom_format_name>'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
)]
```


:::info Tips
**Query Return Content Explanation:**

* **Return Format**: Individual column values as strings by default
* **Access Method**: Use positional references `$<col_position>` (e.g., `$1`, `$2`, `$3`)
* **Example**: `SELECT $1, $2, $3 FROM @stage_name`
* **Key Features**:
* Columns accessed by position, not by name
* Each `$<col_position>` refers to a single column, not the whole row
* Type casting required for non-string operations (e.g., `CAST($1 AS INT)`)
* No embedded schema information in CSV files
:::
- [Query columns by position](./index.md#query-columns-by-position)
- [Query Metadata](./index.md#query-metadata)

## Tutorial

Expand Down
28 changes: 3 additions & 25 deletions docs/en/guides/40-load-data/04-transform/02-querying-tsv.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,33 +3,11 @@ title: Querying TSV Files in Stage
sidebar_label: TSV
---

## Query TSV Files in Stage
## Syntax:

Syntax:
```sql
SELECT [<alias>.]$<col_position> [, $<col_position> ...]
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'TSV| <custom_format_name>'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
)]
```


:::info Tips
**Query Return Content Explanation:**
- [Query columns by position](./index.md#query-columns-by-position)
- [Query Metadata](./index.md#query-metadata)

* **Return Format**: Individual column values as strings by default
* **Access Method**: Use positional references `$<col_position>` (e.g., `$1`, `$2`, `$3`)
* **Example**: `SELECT $1, $2, $3 FROM @stage_name`
* **Key Features**:
* Columns accessed by position, not by name
* Each `$<col_position>` refers to a single column, not the whole row
* Type casting required for non-string operations (e.g., `CAST($1 AS INT)`)
* No embedded schema information in TSV files
:::

## Tutorial

Expand Down
56 changes: 4 additions & 52 deletions docs/en/guides/40-load-data/04-transform/03-querying-ndjson.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,33 +21,10 @@ NDJSON (Newline Delimited JSON) is a JSON-based file format where each line cont
- **Big data compatible**: Widely used in log files, data exports, and ETL pipelines
- **Easy to process**: Each line is an independent JSON object, enabling parallel processing

## Query NDJSON Files in Stage
## Syntax

Syntax:
```sql
SELECT [<alias>.]$1:<column> [, $1:<column> ...]
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'NDJSON| <custom_format_name>'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
)]
```


:::info Tips
**Query Return Content Explanation:**

* **Return Format**: Each row as a single variant object (referenced as `$1`)
* **Access Method**: Use path expressions `$1:column_name`
* **Example**: `SELECT $1:title, $1:author FROM @stage_name`
* **Key Features**:
* Must use path notation to access specific fields
* Type casting required for type-specific operations (e.g., `CAST($1:id AS INT)`)
* Each NDJSON line is parsed as a complete JSON object
* Whole row is represented as a single variant object
:::
- [Query rows as Variants](./index.md#query-rows-as-variants)
- [Query Metadata](./index.md#query-metadata)

## Tutorial

Expand Down Expand Up @@ -106,34 +83,9 @@ FROM @ndjson_query_stage
```

**Key difference:** The pattern `.*[.]ndjson[.]gz` matches files ending with `.ndjson.gz`. Databend automatically decompresses gzip files during query execution thanks to the `COMPRESSION = AUTO` setting in the file format.
### Query with Metadata

You can also include file metadata in your queries, which is useful for tracking data lineage and debugging:

```sql
SELECT
METADATA$FILENAME,
METADATA$FILE_ROW_NUMBER,
$1:title, $1:author
FROM @ndjson_query_stage
(
FILE_FORMAT => 'ndjson_query_format',
PATTERN => '.*[.]ndjson'
);
```

**Metadata columns explained:**
- `METADATA$FILENAME`: Shows which file each row came from - helpful when querying multiple files
- `METADATA$FILE_ROW_NUMBER`: Shows the line number within the source file - useful for tracking specific records

**Use cases:**
- **Data lineage**: Track which source file contributed each record
- **Debugging**: Identify problematic records by file and line number
- **Incremental processing**: Process only specific files or ranges within files

## Related Documentation

- [Loading NDJSON Files](../03-load-semistructured/03-load-ndjson.md) - How to load NDJSON data into tables
- [NDJSON File Format Options](/sql/sql-reference/file-format-options#ndjson-options) - Complete NDJSON format configuration
- [CREATE STAGE](/sql/sql-commands/ddl/stage/ddl-create-stage) - Managing external and internal stages
- [Querying Metadata](./04-querying-metadata.md) - More details about metadata columns
- [CREATE STAGE](/sql/sql-commands/ddl/stage/ddl-create-stage) - Managing external and internal stages
28 changes: 3 additions & 25 deletions docs/en/guides/40-load-data/04-transform/04-querying-avro.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,10 @@ title: Querying Avro Files in Stage
sidebar_label: Avro
---

## Query Avro Files in Stage
## Syntax:

Syntax:
```sql
SELECT [<alias>.]$1:<column> [, $1:<column> ...]
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'AVRO'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
)]
```

:::info Tips
**Query Return Content Explanation:**

* **Return Format**: Each row as a single variant object (referenced as `$1`)
* **Access Method**: Use path expressions `$1:column_name`
* **Example**: `SELECT $1:id, $1:name FROM @stage_name`
* **Key Features**:
* Must use path notation to access specific fields
* Type casting required for type-specific operations (e.g., `CAST($1:id AS INT)`)
* Avro schema is mapped to variant structure
* Whole row is represented as a single variant object
:::
- [Query rows as Variants](./index.md#query-rows-as-variants)
- [Query Metadata](./index.md#query-metadata)

## Avro Querying Features Overview

Expand Down
27 changes: 0 additions & 27 deletions docs/en/guides/40-load-data/04-transform/04-querying-metadata.md

This file was deleted.

Loading
Loading