Skip to content

Commit 89d0d7c

Browse files
committed
Address PR review comments for REST catalog documentation
- Fix description to remove S3 buckets reference (not relevant for this guide) - Fix docker-compose YAML network configuration to avoid duplication - Add step-by-step setup instructions for better clarity - Add troubleshooting guidance for users who don't see expected tables - Include note about sample data loading requirements Addresses feedback from PR #4031 review comments
1 parent 5f60f2c commit 89d0d7c

File tree

1 file changed

+44
-6
lines changed

1 file changed

+44
-6
lines changed

docs/use-cases/data_lake/rest_catalog.md

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ title: 'REST Catalog'
55
pagination_prev: null
66
pagination_next: null
77
description: 'In this guide, we will walk you through the steps to query
8-
your data in S3 buckets using ClickHouse and the REST Catalog.'
8+
your data using ClickHouse and the REST Catalog.'
99
keywords: ['REST', 'Tabular', 'Data Lake', 'Iceberg']
1010
show_related_blogs: true
1111
---
@@ -44,15 +44,15 @@ For local development and testing, you can use a containerized REST catalog setu
4444

4545
You can use various containerized REST catalog implementations such as **[Databricks docker-spark-iceberg](https://github.com/databricks/docker-spark-iceberg/blob/main/docker-compose.yml?ref=blog.min.io)** which provides a complete Spark + Iceberg + REST catalog environment with docker-compose, making it ideal for testing Iceberg integrations.
4646

47-
You'll need to add ClickHouse as a dependency in your docker-compose setup:
47+
**Step 1:** Clone or download the docker-compose setup from the Databricks repository.
48+
49+
**Step 2:** Add ClickHouse as a service to your docker-compose.yml file:
4850

4951
```yaml
5052
clickhouse:
5153
image: clickhouse/clickhouse-server:main
5254
container_name: clickhouse
5355
user: '0:0' # Ensures root permissions
54-
networks:
55-
iceberg_net:
5656
ports:
5757
- "8123:8123"
5858
- "9002:9000"
@@ -68,6 +68,30 @@ clickhouse:
6868
- CLICKHOUSE_PASSWORD=
6969
```
7070
71+
**Step 3:** Ensure your docker-compose.yml includes the necessary network configuration:
72+
73+
```yaml
74+
networks:
75+
iceberg_net:
76+
driver: bridge
77+
```
78+
79+
**Step 4:** Start the entire stack:
80+
81+
```bash
82+
docker-compose up -d
83+
```
84+
85+
**Step 5:** Wait for all services to be ready. You can check the logs:
86+
87+
```bash
88+
docker-compose logs -f
89+
```
90+
91+
:::note
92+
The REST catalog setup requires that sample data be loaded into the Iceberg tables first. Make sure the Spark environment has created and populated the tables before attempting to query them through ClickHouse. The availability of tables depends on the specific docker-compose setup and sample data loading scripts.
93+
:::
94+
7195
### Connecting to Local REST Catalog {#connecting-to-local-rest-catalog}
7296

7397
Connect to your ClickHouse container:
@@ -97,13 +121,27 @@ USE demo;
97121
SHOW TABLES;
98122
```
99123

124+
If your setup includes sample data (such as the taxi dataset), you should see tables like:
125+
100126
```sql title="Response"
101127
┌─name──────────┐
102128
default.taxis
103129
└───────────────┘
104130
```
105131

106-
To query a table:
132+
:::note
133+
If you don't see any tables, this usually means:
134+
1. The Spark environment hasn't created the sample tables yet
135+
2. The REST catalog service isn't fully initialized
136+
3. The sample data loading process hasn't completed
137+
138+
You can check the Spark logs to see the table creation progress:
139+
```bash
140+
docker-compose logs spark
141+
```
142+
:::
143+
144+
To query a table (if available):
107145

108146
```sql
109147
SELECT count(*) FROM `default.taxis`;
@@ -190,4 +228,4 @@ Then load the data from your REST catalog table via an `INSERT INTO SELECT`:
190228
```sql
191229
INSERT INTO taxis
192230
SELECT * FROM demo.`default.taxis`;
193-
```
231+
```

0 commit comments

Comments
 (0)