Skip to content

Commit 481e292

Browse files
committed
Unity catalog
1 parent 53b3317 commit 481e292

File tree

6 files changed

+196
-12
lines changed

6 files changed

+196
-12
lines changed

docs/architecture/cluster-deployment.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ slug: /architecture/cluster-deployment
33
sidebar_label: 'Cluster Deployment'
44
sidebar_position: 100
55
title: 'Cluster Deployment'
6-
description: 'By going through this tutorial, you'll learn how to set up a simple ClickHouse cluster.'
6+
description: 'By going through this tutorial, you will learn how to set up a simple ClickHouse cluster.'
77
---
88

99
This tutorial assumes you've already set up a [local ClickHouse server](../getting-started/install.md)

docs/use-cases/data_lake/glue_catalog.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,8 @@ sidebar_label: 'AWS Glue Catalog'
44
title: 'AWS Glue Catalog'
55
pagination_prev: null
66
pagination_next: null
7-
description: 'ClickHouse supports integration with multiple catalogs (Unity,
8-
Glue, Polaris, etc.). In this guide, we will walk you through the steps to query
9-
your data in S3 buckets using ClickHouse and the Glue Data Catalog.'
7+
description: 'In this guide, we will walk you through the steps to query
8+
your data in S3 buckets using ClickHouse and the AWS Glue Data Catalog.'
109
---
1110

1211
import ExperimentalBadge from '@theme/badges/ExperimentalBadge';

docs/use-cases/data_lake/index.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,9 @@ slug: /use-cases/data-lake
66
title: 'Data Lake'
77
---
88

9-
<!-- the table of contents below is autogenerated by the following script
10-
https://github.com/ClickHouse/clickhouse-docs/blob/main/scripts/autogenerate-table-of-contents.sh
11-
from the YAML frontmatter of the pages themselves. If you've spotted an error,
12-
please edit the titles and descriptions of the pages themselves. -->
9+
ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.).
10+
1311
| Page | Description |
1412
|-----|-----|
15-
| [Querying data in S3 using ClickHouse and the Glue Data Catalog](/use-cases/data-lake/glue-catalog) | ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.). In this guide, we will walk you through the steps to query your data in S3 buckets using ClickHouse and the Glue Data Catalog. |
13+
| [Querying data in S3 using ClickHouse and the Glue Data Catalog](/use-cases/data-lake/glue-catalog) | Query your data in S3 buckets using ClickHouse and the Glue Data Catalog. |
14+
| [Querying data in S3 using ClickHouse and the Unity Data Catalog](/use-cases/data-lake/unity-catalog) | Query your using the Unity Catalog. |
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
---
2+
slug: /use-cases/data-lake/unity-catalog
3+
sidebar_label: 'Unity Catalog'
4+
title: 'Unity Catalog'
5+
pagination_prev: null
6+
pagination_next: null
7+
description: 'In this guide, we will walk you through the steps to query
8+
your data in S3 buckets using ClickHouse and the Unity Catalog.'
9+
---
10+
11+
import ExperimentalBadge from '@theme/badges/ExperimentalBadge';
12+
13+
<ExperimentalBadge/>
14+
15+
:::note
16+
The integration with the Unity Catalog works for managed and external tables.
17+
:::
18+
19+
:::note
20+
This integration is currently only supported on AWS.
21+
:::
22+
23+
ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your data managed by Databricks using ClickHouse and the [Unity Catalog](https://www.databricks.com/product/unity-catalog).
24+
25+
Databricks supports multiple data formats for their lakehouse. With ClickHouse, you can query Unity Catalog tables as both Delta and Iceberg.
26+
27+
## Configuring Unity in Databricks
28+
29+
To allow ClickHouse to interact with the Unity catalog, you need to make sure the Unity Catalog is configured to allow interaction with an external reader. This can be achieved by following the[ "Enable external data access to Unity Catalog"](https://docs.databricks.com/aws/en/external-access/admin) guide.
30+
31+
In addition to enabling external access, ensure the principal configuring the integration has the `EXTERNAL USE SCHEMA` [privilege](https://docs.databricks.com/aws/en/external-access/admin#external-schema) on the schema containing the tables.
32+
33+
Once your catalog is configured, you must generate credentials for ClickHouse. Two different methods can be used, depending on your interaction mode with Unity:
34+
35+
* For Iceberg clients, use authentication as a [service principal](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m).
36+
37+
* For Delta clients, use a Personal Access Token ([PAT](https://docs.databricks.com/aws/en/dev-tools/auth/pat)).
38+
39+
40+
## Creating a connection between Unity Catalog and ClickHouse
41+
42+
With your Unity Catalog configured and authentication in place, establish a connection between ClickHouse and Unity Catalog.
43+
44+
### Read Delta
45+
46+
```sql
47+
CREATE DATABASE unity
48+
ENGINE = DataLakeCatalog('https://<workspace-id>.cloud.databricks.com/api/2.1/unity-catalog')
49+
SETTINGS warehouse = 'CATALOG_NAME', catalog_credential = '<PAT>', catalog_type = 'unity'
50+
```
51+
52+
### Read Iceberg
53+
54+
```sql
55+
CREATE DATABASE unity
56+
ENGINE = DataLakeCatalog('https://<workspace-id>.cloud.databricks.com/api/2.1/unity-catalog/iceberg')
57+
SETTINGS catalog_type = 'rest', catalog_credential = '<client-id>:<client-secret>', warehouse = 'workspace',
58+
oauth_server_uri = 'https://<workspace-id>.cloud.databricks.com/oidc/v1/token', auth_scope = 'all-apis,sql'
59+
```
60+
61+
## Querying Unity catalog tables using ClickHouse
62+
63+
Now that the connection is in place, you can start querying via the Unity catalog. For example:
64+
65+
```sql
66+
USE unity;
67+
68+
SHOW TABLES;
69+
70+
┌─name───────────────────────────────────────────────┐
71+
clickbench.delta_hits
72+
demo.fake_user
73+
information_schema.catalog_privileges
74+
information_schema.catalog_tags
75+
information_schema.catalogs
76+
information_schema.check_constraints
77+
information_schema.column_masks
78+
information_schema.column_tags
79+
information_schema.columns
80+
information_schema.constraint_column_usage
81+
information_schema.constraint_table_usage
82+
information_schema.information_schema_catalog_name
83+
information_schema.key_column_usage
84+
information_schema.parameters
85+
information_schema.referential_constraints
86+
information_schema.routine_columns
87+
information_schema.routine_privileges
88+
information_schema.routines
89+
information_schema.row_filters
90+
information_schema.schema_privileges
91+
information_schema.schema_tags
92+
information_schema.schemata
93+
information_schema.table_constraints
94+
information_schema.table_privileges
95+
information_schema.table_tags
96+
information_schema.tables
97+
information_schema.views
98+
information_schema.volume_privileges
99+
information_schema.volume_tags
100+
information_schema.volumes
101+
uniform.delta_hits
102+
└────────────────────────────────────────────────────┘
103+
```
104+
105+
If you're using the Iceberg client, only the Delta tables with Uniform-enabled will be shown:
106+
107+
```sql
108+
SHOW TABLES
109+
110+
┌─name───────────────┐
111+
uniform.delta_hits
112+
└────────────────────┘
113+
```
114+
115+
To query a table:
116+
117+
```sql
118+
SELECT count(*) FROM `uniform.delta_hits`
119+
```
120+
121+
:::note Backticks required
122+
Backticks are required because ClickHouse doesn’t support more than one namespace.
123+
:::
124+
125+
To inspect the table DDL:
126+
127+
```sql
128+
SHOW CREATE TABLE `uniform.delta_hits`
129+
130+
CREATE TABLE unity_uniform.`uniform.delta_hits`
131+
(
132+
`WatchID` Int64,
133+
`JavaEnable` Int32,
134+
`Title` String,
135+
`GoodEvent` Int32,
136+
`EventTime` DateTime64(6, 'UTC'),
137+
`EventDate` Date,
138+
`CounterID` Int32,
139+
`ClientIP` Int32,
140+
...
141+
`FromTag` String,
142+
`HasGCLID` Int32,
143+
`RefererHash` Int64,
144+
`URLHash` Int64,
145+
`CLID` Int32
146+
)
147+
ENGINE = Iceberg('s3://<path>);
148+
149+
```
150+
151+
## Loading data from your Data Lake into ClickHouse
152+
153+
If you need to load data from Databricks into ClickHouse, start by creating a local ClickHouse table:
154+
155+
```sql
156+
CREATE TABLE hits
157+
(
158+
`WatchID` Int64,
159+
`JavaEnable` Int32,
160+
`Title` String,
161+
`GoodEvent` Int32,
162+
`EventTime` DateTime64(6, 'UTC'),
163+
`EventDate` Date,
164+
`CounterID` Int32,
165+
`ClientIP` Int32,
166+
...
167+
`FromTag` String,
168+
`HasGCLID` Int32,
169+
`RefererHash` Int64,
170+
`URLHash` Int64,
171+
`CLID` Int32
172+
)
173+
PRIMARY KEY (CounterID, EventDate, UserID, EventTime, WatchID);
174+
```
175+
176+
Then load the data from your Unity Catalog table via an `INSERT INTO SELECT`:
177+
178+
```sql
179+
INSERT INTO hits SELECT * FROM unity_uniform.`uniform.delta_hits`;
180+
```

scripts/autogenerate-table-of-contents.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,5 @@ python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/
3535
python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/cloud/changelogs" --md="docs/cloud/reference/release-notes-index.md"
3636
python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/cloud/manage/api" --md="docs/cloud/manage/api/api-reference-index.md"
3737
python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/development" --md="docs/development/index.md" --ignore images
38-
python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/use-cases/data_lake" --md="docs/use-cases/data_lake/index.md"
3938
deactivate
4039
rm -r venv

sidebars.js

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,8 +117,15 @@ const sidebars = {
117117
]
118118
},
119119
{
120-
type: "doc",
121-
id: "use-cases/data_lake/glue_catalog"
120+
type: "category",
121+
label: "Data lake",
122+
collapsed: true,
123+
collapsible: true,
124+
link: { type: "doc", id: "use-cases/data_lake/index" },
125+
items: [
126+
"use-cases/data_lake/glue_catalog",
127+
"use-cases/data_lake/unity_catalog"
128+
]
122129
}
123130
]
124131
},

0 commit comments

Comments
 (0)