|
| 1 | +--- |
| 2 | +slug: /use-cases/data-lake/rest-catalog |
| 3 | +sidebar_label: 'REST Catalog' |
| 4 | +title: 'REST Catalog' |
| 5 | +pagination_prev: null |
| 6 | +pagination_next: null |
| 7 | +description: 'In this guide, we will walk you through the steps to query |
| 8 | + your data using ClickHouse and the REST Catalog.' |
| 9 | +keywords: ['REST', 'Tabular', 'Data Lake', 'Iceberg'] |
| 10 | +show_related_blogs: true |
| 11 | +--- |
| 12 | + |
| 13 | +import ExperimentalBadge from '@theme/badges/ExperimentalBadge'; |
| 14 | + |
| 15 | +<ExperimentalBadge/> |
| 16 | + |
| 17 | +:::note |
| 18 | +Integration with the REST Catalog works with Iceberg tables only. |
| 19 | +This integration supports both AWS S3 and other cloud storage providers. |
| 20 | +::: |
| 21 | + |
| 22 | +ClickHouse supports integration with multiple catalogs (Unity, Glue, REST, Polaris, etc.). This guide will walk you through the steps to query your data using ClickHouse and the [REST Catalog](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml/) specification. |
| 23 | + |
| 24 | +The REST Catalog is a standardized API specification for Iceberg catalogs, supported by various platforms including: |
| 25 | +- **Local development environments** (using docker-compose setups) |
| 26 | +- **Managed services** like Tabular.io |
| 27 | +- **Self-hosted** REST catalog implementations |
| 28 | + |
| 29 | +:::note |
| 30 | +As this feature is experimental, you will need to enable it using: |
| 31 | +`SET allow_experimental_database_iceberg = 1;` |
| 32 | +::: |
| 33 | + |
| 34 | +## Local Development Setup {#local-development-setup} |
| 35 | + |
| 36 | +For local development and testing, you can use a containerized REST catalog setup. This approach is ideal for learning, prototyping, and development environments. |
| 37 | + |
| 38 | +### Prerequisites {#local-prerequisites} |
| 39 | + |
| 40 | +1. **Docker and Docker Compose**: Ensure Docker is installed and running |
| 41 | +2. **Sample Setup**: You can use various docker-compose setups (see Alternative Docker Images below) |
| 42 | + |
| 43 | +### Setting up Local REST Catalog {#setting-up-local-rest-catalog} |
| 44 | + |
| 45 | +You can use various containerized REST catalog implementations such as **[Databricks docker-spark-iceberg](https://github.com/databricks/docker-spark-iceberg/blob/main/docker-compose.yml?ref=blog.min.io)** which provides a complete Spark + Iceberg + REST catalog environment with docker-compose, making it ideal for testing Iceberg integrations. |
| 46 | + |
| 47 | +**Step 1:** Create a new folder in which to run the example, then create a file `docker-compose.yml` with the configuration from [Databricks docker-spark-iceberg](https://github.com/databricks/docker-spark-iceberg/blob/main/docker-compose.yml?ref=blog.min.io). |
| 48 | + |
| 49 | +**Step 2:** Next, create a file `docker-compose.override.yml` and place the following ClickHouse container configuration into it: |
| 50 | + |
| 51 | +```yaml |
| 52 | +version: '3.8' |
| 53 | + |
| 54 | +services: |
| 55 | + clickhouse: |
| 56 | + image: clickhouse/clickhouse-server:25.5.6 |
| 57 | + container_name: clickhouse |
| 58 | + user: '0:0' # Ensures root permissions |
| 59 | + ports: |
| 60 | + - "8123:8123" |
| 61 | + - "9002:9000" |
| 62 | + volumes: |
| 63 | + - ./clickhouse:/var/lib/clickhouse |
| 64 | + - ./clickhouse/data_import:/var/lib/clickhouse/data_import # Mount dataset folder |
| 65 | + networks: |
| 66 | + - iceberg_net |
| 67 | + environment: |
| 68 | + - CLICKHOUSE_DB=default |
| 69 | + - CLICKHOUSE_USER=default |
| 70 | + - CLICKHOUSE_DO_NOT_CHOWN=1 |
| 71 | + - CLICKHOUSE_PASSWORD= |
| 72 | +``` |
| 73 | +
|
| 74 | +**Step 3:** Run the following command to start the services: |
| 75 | +
|
| 76 | +```bash |
| 77 | +docker compose up |
| 78 | +``` |
| 79 | + |
| 80 | +**Step 4:** Wait for all services to be ready. You can check the logs: |
| 81 | + |
| 82 | +```bash |
| 83 | +docker-compose logs -f |
| 84 | +``` |
| 85 | + |
| 86 | +:::note |
| 87 | +The REST catalog setup requires that sample data be loaded into the Iceberg tables first. Make sure the Spark environment has created and populated the tables before attempting to query them through ClickHouse. The availability of tables depends on the specific docker-compose setup and sample data loading scripts. |
| 88 | +::: |
| 89 | + |
| 90 | +### Connecting to Local REST Catalog {#connecting-to-local-rest-catalog} |
| 91 | + |
| 92 | +Connect to your ClickHouse container: |
| 93 | + |
| 94 | +```bash |
| 95 | +docker exec -it clickhouse clickhouse-client |
| 96 | +``` |
| 97 | + |
| 98 | +Then create the database connection to the REST catalog: |
| 99 | + |
| 100 | +```sql |
| 101 | +SET allow_experimental_database_iceberg = 1; |
| 102 | + |
| 103 | +CREATE DATABASE demo |
| 104 | +ENGINE = DataLakeCatalog('http://rest:8181/v1', 'admin', 'password') |
| 105 | +SETTINGS |
| 106 | + catalog_type = 'rest', |
| 107 | + storage_endpoint = 'http://minio:9000/lakehouse', |
| 108 | + warehouse = 'demo' |
| 109 | +``` |
| 110 | + |
| 111 | +## Querying REST catalog tables using ClickHouse {#querying-rest-catalog-tables-using-clickhouse} |
| 112 | + |
| 113 | +Now that the connection is in place, you can start querying via the REST catalog. For example: |
| 114 | + |
| 115 | +```sql |
| 116 | +USE demo; |
| 117 | + |
| 118 | +SHOW TABLES; |
| 119 | +``` |
| 120 | + |
| 121 | +If your setup includes sample data (such as the taxi dataset), you should see tables like: |
| 122 | + |
| 123 | +```sql title="Response" |
| 124 | +┌─name──────────┐ |
| 125 | +│ default.taxis │ |
| 126 | +└───────────────┘ |
| 127 | +``` |
| 128 | + |
| 129 | +:::note |
| 130 | +If you don't see any tables, this usually means: |
| 131 | +1. The Spark environment hasn't created the sample tables yet |
| 132 | +2. The REST catalog service isn't fully initialized |
| 133 | +3. The sample data loading process hasn't completed |
| 134 | + |
| 135 | +You can check the Spark logs to see the table creation progress: |
| 136 | +```bash |
| 137 | +docker-compose logs spark |
| 138 | +``` |
| 139 | +::: |
| 140 | + |
| 141 | +To query a table (if available): |
| 142 | + |
| 143 | +```sql |
| 144 | +SELECT count(*) FROM `default.taxis`; |
| 145 | +``` |
| 146 | + |
| 147 | +```sql title="Response" |
| 148 | +┌─count()─┐ |
| 149 | +│ 2171187 │ |
| 150 | +└─────────┘ |
| 151 | +``` |
| 152 | + |
| 153 | +:::note Backticks required |
| 154 | +Backticks are required because ClickHouse doesn't support more than one namespace. |
| 155 | +::: |
| 156 | + |
| 157 | +To inspect the table DDL: |
| 158 | + |
| 159 | +```sql |
| 160 | +SHOW CREATE TABLE `default.taxis`; |
| 161 | +``` |
| 162 | + |
| 163 | +```sql title="Response" |
| 164 | +┌─statement─────────────────────────────────────────────────────────────────────────────────────┐ |
| 165 | +│ CREATE TABLE demo.`default.taxis` │ |
| 166 | +│ ( │ |
| 167 | +│ `VendorID` Nullable(Int64), │ |
| 168 | +│ `tpep_pickup_datetime` Nullable(DateTime64(6)), │ |
| 169 | +│ `tpep_dropoff_datetime` Nullable(DateTime64(6)), │ |
| 170 | +│ `passenger_count` Nullable(Float64), │ |
| 171 | +│ `trip_distance` Nullable(Float64), │ |
| 172 | +│ `RatecodeID` Nullable(Float64), │ |
| 173 | +│ `store_and_fwd_flag` Nullable(String), │ |
| 174 | +│ `PULocationID` Nullable(Int64), │ |
| 175 | +│ `DOLocationID` Nullable(Int64), │ |
| 176 | +│ `payment_type` Nullable(Int64), │ |
| 177 | +│ `fare_amount` Nullable(Float64), │ |
| 178 | +│ `extra` Nullable(Float64), │ |
| 179 | +│ `mta_tax` Nullable(Float64), │ |
| 180 | +│ `tip_amount` Nullable(Float64), │ |
| 181 | +│ `tolls_amount` Nullable(Float64), │ |
| 182 | +│ `improvement_surcharge` Nullable(Float64), │ |
| 183 | +│ `total_amount` Nullable(Float64), │ |
| 184 | +│ `congestion_surcharge` Nullable(Float64), │ |
| 185 | +│ `airport_fee` Nullable(Float64) │ |
| 186 | +│ ) │ |
| 187 | +│ ENGINE = Iceberg('http://minio:9000/lakehouse/warehouse/default/taxis/', 'admin', '[HIDDEN]') │ |
| 188 | +└───────────────────────────────────────────────────────────────────────────────────────────────┘ |
| 189 | +``` |
| 190 | + |
| 191 | +## Loading data from your Data Lake into ClickHouse {#loading-data-from-your-data-lake-into-clickhouse} |
| 192 | + |
| 193 | +If you need to load data from the REST catalog into ClickHouse, start by creating a local ClickHouse table: |
| 194 | + |
| 195 | +```sql |
| 196 | +CREATE TABLE taxis |
| 197 | +( |
| 198 | + `VendorID` Int64, |
| 199 | + `tpep_pickup_datetime` DateTime64(6), |
| 200 | + `tpep_dropoff_datetime` DateTime64(6), |
| 201 | + `passenger_count` Float64, |
| 202 | + `trip_distance` Float64, |
| 203 | + `RatecodeID` Float64, |
| 204 | + `store_and_fwd_flag` String, |
| 205 | + `PULocationID` Int64, |
| 206 | + `DOLocationID` Int64, |
| 207 | + `payment_type` Int64, |
| 208 | + `fare_amount` Float64, |
| 209 | + `extra` Float64, |
| 210 | + `mta_tax` Float64, |
| 211 | + `tip_amount` Float64, |
| 212 | + `tolls_amount` Float64, |
| 213 | + `improvement_surcharge` Float64, |
| 214 | + `total_amount` Float64, |
| 215 | + `congestion_surcharge` Float64, |
| 216 | + `airport_fee` Float64 |
| 217 | +) |
| 218 | +ENGINE = MergeTree() |
| 219 | +PARTITION BY toYYYYMM(tpep_pickup_datetime) |
| 220 | +ORDER BY (VendorID, tpep_pickup_datetime, PULocationID, DOLocationID); |
| 221 | +``` |
| 222 | + |
| 223 | +Then load the data from your REST catalog table via an `INSERT INTO SELECT`: |
| 224 | + |
| 225 | +```sql |
| 226 | +INSERT INTO taxis |
| 227 | +SELECT * FROM demo.`default.taxis`; |
| 228 | +``` |
0 commit comments