Skip to content

Commit 1f10253

Browse files
authored
Merge pull request #4031 from somratdutta/main
Add REST catalog support in docs (#1)
2 parents 6be5783 + 2c61a75 commit 1f10253

File tree

4 files changed

+234
-3
lines changed

4 files changed

+234
-3
lines changed

docs/integrations/index.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@ We are actively compiling this list of ClickHouse integrations below, so it's no
245245
|RabbitMQ|<Rabbitmqsvg alt="RabbitMQ logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows ClickHouse to connect [RabbitMQ](https://www.rabbitmq.com/).|[Documentation](/engines/table-engines/integrations/rabbitmq)|
246246
|Redis|<Redissvg alt="Redis logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows ClickHouse to use [Redis](https://redis.io/) as a dictionary source.|[Documentation](/sql-reference/dictionaries/index.md#redis)|
247247
|Redpanda|<Image img={redpanda} alt="Redpanda logo" size="logo"/>|Data ingestion|Redpanda is the streaming data platform for developers. It's API-compatible with Apache Kafka, but 10x faster, much easier to use, and more cost effective|[Blog](https://redpanda.com/blog/real-time-olap-database-clickhouse-redpanda)|
248+
|REST Catalog||Data ingestion|Integration with REST Catalog specification for Iceberg tables, supporting multiple catalog providers including Tabular.io.|[Documentation](/use-cases/data-lake/rest-catalog)|
248249
|Rust|<Image img={rust} size="logo" alt="Rust logo"/>|Language client|A typed client for ClickHouse|[Documentation](/integrations/language-clients/rust.md)|
249250
|SQLite|<Sqlitesvg alt="Sqlite logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows to import and export data to SQLite and supports queries to SQLite tables directly from ClickHouse.|[Documentation](/engines/table-engines/integrations/sqlite)|
250251
|Superset|<Supersetsvg alt="Superset logo" style={{width: '3rem'}}/>|Data visualization|Explore and visualize your ClickHouse data with Apache Superset.|[Documentation](/integrations/data-visualization/superset-and-clickhouse.md)|

docs/use-cases/data_lake/index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,13 @@ pagination_prev: null
44
pagination_next: null
55
slug: /use-cases/data-lake
66
title: 'Data Lake'
7-
keywords: ['data lake', 'glue', 'unity']
7+
keywords: ['data lake', 'glue', 'unity', 'rest']
88
---
99

10-
ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.).
10+
ClickHouse supports integration with multiple catalogs (Unity, Glue, REST, Polaris, etc.).
1111

1212
| Page | Description |
1313
|-----|-----|
1414
| [Querying data in S3 using ClickHouse and the Glue Data Catalog](/use-cases/data-lake/glue-catalog) | Query your data in S3 buckets using ClickHouse and the Glue Data Catalog. |
1515
| [Querying data in S3 using ClickHouse and the Unity Data Catalog](/use-cases/data-lake/unity-catalog) | Query your using the Unity Catalog. |
16+
| [Querying data in S3 using ClickHouse and the REST Catalog](/use-cases/data-lake/rest-catalog) | Query your data using the REST Catalog (Tabular.io). |
Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
---
2+
slug: /use-cases/data-lake/rest-catalog
3+
sidebar_label: 'REST Catalog'
4+
title: 'REST Catalog'
5+
pagination_prev: null
6+
pagination_next: null
7+
description: 'In this guide, we will walk you through the steps to query
8+
your data using ClickHouse and the REST Catalog.'
9+
keywords: ['REST', 'Tabular', 'Data Lake', 'Iceberg']
10+
show_related_blogs: true
11+
---
12+
13+
import ExperimentalBadge from '@theme/badges/ExperimentalBadge';
14+
15+
<ExperimentalBadge/>
16+
17+
:::note
18+
Integration with the REST Catalog works with Iceberg tables only.
19+
This integration supports both AWS S3 and other cloud storage providers.
20+
:::
21+
22+
ClickHouse supports integration with multiple catalogs (Unity, Glue, REST, Polaris, etc.). This guide will walk you through the steps to query your data using ClickHouse and the [REST Catalog](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml/) specification.
23+
24+
The REST Catalog is a standardized API specification for Iceberg catalogs, supported by various platforms including:
25+
- **Local development environments** (using docker-compose setups)
26+
- **Managed services** like Tabular.io
27+
- **Self-hosted** REST catalog implementations
28+
29+
:::note
30+
As this feature is experimental, you will need to enable it using:
31+
`SET allow_experimental_database_iceberg = 1;`
32+
:::
33+
34+
## Local Development Setup {#local-development-setup}
35+
36+
For local development and testing, you can use a containerized REST catalog setup. This approach is ideal for learning, prototyping, and development environments.
37+
38+
### Prerequisites {#local-prerequisites}
39+
40+
1. **Docker and Docker Compose**: Ensure Docker is installed and running
41+
2. **Sample Setup**: You can use various docker-compose setups (see Alternative Docker Images below)
42+
43+
### Setting up Local REST Catalog {#setting-up-local-rest-catalog}
44+
45+
You can use various containerized REST catalog implementations such as **[Databricks docker-spark-iceberg](https://github.com/databricks/docker-spark-iceberg/blob/main/docker-compose.yml?ref=blog.min.io)** which provides a complete Spark + Iceberg + REST catalog environment with docker-compose, making it ideal for testing Iceberg integrations.
46+
47+
**Step 1:** Create a new folder in which to run the example, then create a file `docker-compose.yml` with the configuration from [Databricks docker-spark-iceberg](https://github.com/databricks/docker-spark-iceberg/blob/main/docker-compose.yml?ref=blog.min.io).
48+
49+
**Step 2:** Next, create a file `docker-compose.override.yml` and place the following ClickHouse container configuration into it:
50+
51+
```yaml
52+
version: '3.8'
53+
54+
services:
55+
clickhouse:
56+
image: clickhouse/clickhouse-server:25.5.6
57+
container_name: clickhouse
58+
user: '0:0' # Ensures root permissions
59+
ports:
60+
- "8123:8123"
61+
- "9002:9000"
62+
volumes:
63+
- ./clickhouse:/var/lib/clickhouse
64+
- ./clickhouse/data_import:/var/lib/clickhouse/data_import # Mount dataset folder
65+
networks:
66+
- iceberg_net
67+
environment:
68+
- CLICKHOUSE_DB=default
69+
- CLICKHOUSE_USER=default
70+
- CLICKHOUSE_DO_NOT_CHOWN=1
71+
- CLICKHOUSE_PASSWORD=
72+
```
73+
74+
**Step 3:** Run the following command to start the services:
75+
76+
```bash
77+
docker compose up
78+
```
79+
80+
**Step 4:** Wait for all services to be ready. You can check the logs:
81+
82+
```bash
83+
docker-compose logs -f
84+
```
85+
86+
:::note
87+
The REST catalog setup requires that sample data be loaded into the Iceberg tables first. Make sure the Spark environment has created and populated the tables before attempting to query them through ClickHouse. The availability of tables depends on the specific docker-compose setup and sample data loading scripts.
88+
:::
89+
90+
### Connecting to Local REST Catalog {#connecting-to-local-rest-catalog}
91+
92+
Connect to your ClickHouse container:
93+
94+
```bash
95+
docker exec -it clickhouse clickhouse-client
96+
```
97+
98+
Then create the database connection to the REST catalog:
99+
100+
```sql
101+
SET allow_experimental_database_iceberg = 1;
102+
103+
CREATE DATABASE demo
104+
ENGINE = DataLakeCatalog('http://rest:8181/v1', 'admin', 'password')
105+
SETTINGS
106+
catalog_type = 'rest',
107+
storage_endpoint = 'http://minio:9000/lakehouse',
108+
warehouse = 'demo'
109+
```
110+
111+
## Querying REST catalog tables using ClickHouse {#querying-rest-catalog-tables-using-clickhouse}
112+
113+
Now that the connection is in place, you can start querying via the REST catalog. For example:
114+
115+
```sql
116+
USE demo;
117+
118+
SHOW TABLES;
119+
```
120+
121+
If your setup includes sample data (such as the taxi dataset), you should see tables like:
122+
123+
```sql title="Response"
124+
┌─name──────────┐
125+
default.taxis
126+
└───────────────┘
127+
```
128+
129+
:::note
130+
If you don't see any tables, this usually means:
131+
1. The Spark environment hasn't created the sample tables yet
132+
2. The REST catalog service isn't fully initialized
133+
3. The sample data loading process hasn't completed
134+
135+
You can check the Spark logs to see the table creation progress:
136+
```bash
137+
docker-compose logs spark
138+
```
139+
:::
140+
141+
To query a table (if available):
142+
143+
```sql
144+
SELECT count(*) FROM `default.taxis`;
145+
```
146+
147+
```sql title="Response"
148+
┌─count()─┐
149+
2171187
150+
└─────────┘
151+
```
152+
153+
:::note Backticks required
154+
Backticks are required because ClickHouse doesn't support more than one namespace.
155+
:::
156+
157+
To inspect the table DDL:
158+
159+
```sql
160+
SHOW CREATE TABLE `default.taxis`;
161+
```
162+
163+
```sql title="Response"
164+
┌─statement─────────────────────────────────────────────────────────────────────────────────────┐
165+
│ CREATE TABLE demo.`default.taxis`
166+
│ ( │
167+
`VendorID` Nullable(Int64), │
168+
`tpep_pickup_datetime` Nullable(DateTime64(6)), │
169+
`tpep_dropoff_datetime` Nullable(DateTime64(6)), │
170+
`passenger_count` Nullable(Float64), │
171+
`trip_distance` Nullable(Float64), │
172+
`RatecodeID` Nullable(Float64), │
173+
`store_and_fwd_flag` Nullable(String), │
174+
`PULocationID` Nullable(Int64), │
175+
`DOLocationID` Nullable(Int64), │
176+
`payment_type` Nullable(Int64), │
177+
`fare_amount` Nullable(Float64), │
178+
`extra` Nullable(Float64), │
179+
`mta_tax` Nullable(Float64), │
180+
`tip_amount` Nullable(Float64), │
181+
`tolls_amount` Nullable(Float64), │
182+
`improvement_surcharge` Nullable(Float64), │
183+
`total_amount` Nullable(Float64), │
184+
`congestion_surcharge` Nullable(Float64), │
185+
`airport_fee` Nullable(Float64) │
186+
│ ) │
187+
│ ENGINE = Iceberg('http://minio:9000/lakehouse/warehouse/default/taxis/', 'admin', '[HIDDEN]') │
188+
└───────────────────────────────────────────────────────────────────────────────────────────────┘
189+
```
190+
191+
## Loading data from your Data Lake into ClickHouse {#loading-data-from-your-data-lake-into-clickhouse}
192+
193+
If you need to load data from the REST catalog into ClickHouse, start by creating a local ClickHouse table:
194+
195+
```sql
196+
CREATE TABLE taxis
197+
(
198+
`VendorID` Int64,
199+
`tpep_pickup_datetime` DateTime64(6),
200+
`tpep_dropoff_datetime` DateTime64(6),
201+
`passenger_count` Float64,
202+
`trip_distance` Float64,
203+
`RatecodeID` Float64,
204+
`store_and_fwd_flag` String,
205+
`PULocationID` Int64,
206+
`DOLocationID` Int64,
207+
`payment_type` Int64,
208+
`fare_amount` Float64,
209+
`extra` Float64,
210+
`mta_tax` Float64,
211+
`tip_amount` Float64,
212+
`tolls_amount` Float64,
213+
`improvement_surcharge` Float64,
214+
`total_amount` Float64,
215+
`congestion_surcharge` Float64,
216+
`airport_fee` Float64
217+
)
218+
ENGINE = MergeTree()
219+
PARTITION BY toYYYYMM(tpep_pickup_datetime)
220+
ORDER BY (VendorID, tpep_pickup_datetime, PULocationID, DOLocationID);
221+
```
222+
223+
Then load the data from your REST catalog table via an `INSERT INTO SELECT`:
224+
225+
```sql
226+
INSERT INTO taxis
227+
SELECT * FROM demo.`default.taxis`;
228+
```

sidebars.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,8 @@ const sidebars = {
167167
link: { type: "doc", id: "use-cases/data_lake/index" },
168168
items: [
169169
"use-cases/data_lake/glue_catalog",
170-
"use-cases/data_lake/unity_catalog"
170+
"use-cases/data_lake/unity_catalog",
171+
"use-cases/data_lake/rest_catalog"
171172
]
172173
},
173174
{

0 commit comments

Comments
 (0)