From 8f82e5f33082c3b4e86b07e0c3f5a2ec2ff8788e Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 18 Aug 2025 16:06:48 +0100 Subject: [PATCH 01/12] docs: update version --- website/src/install.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/website/src/install.md b/website/src/install.md index bcda751a3f..3528bd74fb 100644 --- a/website/src/install.md +++ b/website/src/install.md @@ -26,8 +26,14 @@ Cargo 1.75.0 or later is required to build. Add `iceberg` and `iceberg-catalog-rest` into `Cargo.toml` dependencies: ```toml -iceberg = "0.2.0" -iceberg-catalog-rest = "0.2.0" +iceberg = "0.6.0" +iceberg-catalog-rest = "0.6.0" +``` + +using `cargo add`: + +```bash +$ cargo add iceberg iceberg-catalog-rest ``` iceberg is under active development, you may want to use the git version instead: From 3d1ef60191ef2c52cc17a3b3ba01c7575ef82404 Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 18 Aug 2025 16:07:10 +0100 Subject: [PATCH 02/12] docs: describe different catalogues supported --- website/src/api.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/website/src/api.md b/website/src/api.md index eaf81ee661..6000ec5515 100644 --- a/website/src/api.md +++ b/website/src/api.md @@ -24,7 +24,21 @@ * Create and list namespaces. * Create, load, and drop tables -Currently only rest catalog has been implemented, and other catalogs are under active development. Here is an +There is support for the following catalogs: + +* `RestCatalog` - the Iceberg REST catalog + +* `Glue` - the AWS Glue Data Catalog + +* `HMS` - Apache Iceberg HiveMetaStore catalog + +* `S3Tables` - AWS S3 Tables + +* `SQL` - SQL-based catalog + +## `RestCatalog` + +Here is an example of how to create a `RestCatalog`: ```rust,no_run,noplayground From 212828c1917dd409251819a08544def751af2bfd Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 18 Aug 2025 16:15:16 +0100 Subject: [PATCH 03/12] docs: fix case --- website/src/api.md | 3 +-- website/src/introduction.md | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/website/src/api.md b/website/src/api.md index 6000ec5515..18513aca82 100644 --- a/website/src/api.md +++ b/website/src/api.md @@ -38,8 +38,7 @@ There is support for the following catalogs: ## `RestCatalog` -Here is an -example of how to create a `RestCatalog`: +Here is an example of how to create a `RestCatalog`: ```rust,no_run,noplayground {{#rustdoc_include ../../crates/examples/src/rest_catalog_namespace.rs:create_catalog}} diff --git a/website/src/introduction.md b/website/src/introduction.md index 260ec690ed..f4a33988fe 100644 --- a/website/src/introduction.md +++ b/website/src/introduction.md @@ -19,4 +19,4 @@ # Iceberg Rust -Iceberg Rust is a rust implementation for accessing iceberg tables. +Iceberg Rust is a Rust implementation for accessing Apache Iceberg tables. From 6b1a7e89594a46348db39448848fea4bb2ec82b5 Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 18 Aug 2025 16:58:58 +0100 Subject: [PATCH 04/12] docs: add longer intro --- website/src/introduction.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/website/src/introduction.md b/website/src/introduction.md index f4a33988fe..9edf5d2ee6 100644 --- a/website/src/introduction.md +++ b/website/src/introduction.md @@ -19,4 +19,19 @@ # Iceberg Rust -Iceberg Rust is a Rust implementation for accessing Apache Iceberg tables. +`iceberg-rust` is a Rust implementation for managing Apache Iceberg tables. + +## What is Apache Iceberg? + +[Apache Iceberg](https://iceberg.apache.org/docs/nightly/) is a modern, high-performance open table format +for huge analytic datasets that brings SQL-like tables to processing engines including Spark, Trino, PrestoDB, Flink, Hive and Impala. + +Rather than being a new file type, Iceberg provides a metadata layer that sits on top of formats like Parquet +and ORC, ensuring data is organized, accessible, and safe to work with at scale. It introduces features long +expected in databases such as transactional consistency, schema evolution, and time travel into environments +where files are stored directly on systems like Amazon S3 or HDFS. + +Originally developed at Netflix, it was designed as a response to the limitations of early Hive tables, which were +essentially directories of files with only loose conventions for schema and partitioning. While this approach +enabled cheap storage of large datasets, it struggled with schema changes, concurrent writes, and efficient query +planning. From 5c5f7ea7acf69e8770df0103d5253957bd4d8ed9 Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 18 Aug 2025 17:02:13 +0100 Subject: [PATCH 05/12] docs: describe more catalogues in table --- website/src/api.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/website/src/api.md b/website/src/api.md index 18513aca82..cc8f97c133 100644 --- a/website/src/api.md +++ b/website/src/api.md @@ -26,15 +26,13 @@ There is support for the following catalogs: -* `RestCatalog` - the Iceberg REST catalog - -* `Glue` - the AWS Glue Data Catalog - -* `HMS` - Apache Iceberg HiveMetaStore catalog - -* `S3Tables` - AWS S3 Tables - -* `SQL` - SQL-based catalog +| Catalog | Decription | Implementation Status | +|---------|------------|-----------------------| +| `RestCatalog` | the Iceberg REST catalog | ✅ | +| `Glue` | the AWS Glue Data Catalog | ✅ | +| `HMS` | Apache Iceberg HiveMetaStore catalog | ✅ | +| `S3Tables` | AWS S3 Tables | 🚧 | +| `SQL` | SQL-based catalog | 🚧 | ## `RestCatalog` From 26442036bc6aa594394610f8503451502dce0cd8 Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 18 Aug 2025 17:06:59 +0100 Subject: [PATCH 06/12] fix(typo): description --- website/src/api.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/src/api.md b/website/src/api.md index cc8f97c133..a726d97eb6 100644 --- a/website/src/api.md +++ b/website/src/api.md @@ -26,8 +26,8 @@ There is support for the following catalogs: -| Catalog | Decription | Implementation Status | -|---------|------------|-----------------------| +| Catalog | Description | Implementation Status | +|---------|------------|------------------------| | `RestCatalog` | the Iceberg REST catalog | ✅ | | `Glue` | the AWS Glue Data Catalog | ✅ | | `HMS` | Apache Iceberg HiveMetaStore catalog | ✅ | From 717ccf68384ac492b537edd5e2d913685f366035 Mon Sep 17 00:00:00 2001 From: Alex Date: Tue, 19 Aug 2025 09:41:38 +0100 Subject: [PATCH 07/12] docs: remove implementation status --- website/src/api.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/website/src/api.md b/website/src/api.md index a726d97eb6..00cfb5ba64 100644 --- a/website/src/api.md +++ b/website/src/api.md @@ -26,13 +26,16 @@ There is support for the following catalogs: -| Catalog | Description | Implementation Status | -|---------|------------|------------------------| -| `RestCatalog` | the Iceberg REST catalog | ✅ | -| `Glue` | the AWS Glue Data Catalog | ✅ | -| `HMS` | Apache Iceberg HiveMetaStore catalog | ✅ | -| `S3Tables` | AWS S3 Tables | 🚧 | -| `SQL` | SQL-based catalog | 🚧 | +| Catalog | Description | +|---------|------------| +| `RestCatalog` | the Iceberg REST catalog | +| `Glue` | the AWS Glue Data Catalog | +| `MemoryCatalog` | a memory-based Catalog | +| `HMS` | Apache Iceberg HiveMetaStore catalog | +| `S3Tables` | AWS S3 Tables | +| `SQL` | SQL-based catalog | + +Not all catalog implementations are complete. ## `RestCatalog` From 900e9572e0da1f69fd47439a5086250f8eb3c70c Mon Sep 17 00:00:00 2001 From: Alex Date: Tue, 19 Aug 2025 09:42:20 +0100 Subject: [PATCH 08/12] docs: fix typo --- crates/iceberg/src/catalog/mod.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crates/iceberg/src/catalog/mod.rs b/crates/iceberg/src/catalog/mod.rs index a468edc475..26d65d1c84 100644 --- a/crates/iceberg/src/catalog/mod.rs +++ b/crates/iceberg/src/catalog/mod.rs @@ -125,7 +125,7 @@ pub trait CatalogBuilder: Default + Debug + Send + Sync { /// NamespaceIdent represents the identifier of a namespace in the catalog. /// /// The namespace identifier is a list of strings, where each string is a -/// component of the namespace. It's catalog implementer's responsibility to +/// component of the namespace. It's the catalog implementer's responsibility to /// handle the namespace identifier correctly. #[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] pub struct NamespaceIdent(Vec); From 342c78f63312e5682f5aaa9c9e7737f6e4d3e861 Mon Sep 17 00:00:00 2001 From: Alex <1221721+atcol@users.noreply.github.com> Date: Tue, 19 Aug 2025 15:27:37 +0100 Subject: [PATCH 09/12] docs: reword Co-authored-by: Renjie Liu --- website/src/api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/src/api.md b/website/src/api.md index 00cfb5ba64..9ef3a33b35 100644 --- a/website/src/api.md +++ b/website/src/api.md @@ -28,7 +28,7 @@ There is support for the following catalogs: | Catalog | Description | |---------|------------| -| `RestCatalog` | the Iceberg REST catalog | +| `Rest` | the Iceberg REST catalog | | `Glue` | the AWS Glue Data Catalog | | `MemoryCatalog` | a memory-based Catalog | | `HMS` | Apache Iceberg HiveMetaStore catalog | From 42416a4bea5eb67914ef2a2d2d05ef15d16f448c Mon Sep 17 00:00:00 2001 From: Alex <1221721+atcol@users.noreply.github.com> Date: Tue, 19 Aug 2025 15:27:45 +0100 Subject: [PATCH 10/12] docs: reword Co-authored-by: Renjie Liu --- website/src/api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/src/api.md b/website/src/api.md index 9ef3a33b35..27a68cbdcf 100644 --- a/website/src/api.md +++ b/website/src/api.md @@ -30,7 +30,7 @@ There is support for the following catalogs: |---------|------------| | `Rest` | the Iceberg REST catalog | | `Glue` | the AWS Glue Data Catalog | -| `MemoryCatalog` | a memory-based Catalog | +| `Memory` | a memory-based Catalog | | `HMS` | Apache Iceberg HiveMetaStore catalog | | `S3Tables` | AWS S3 Tables | | `SQL` | SQL-based catalog | From 53ed027d1c4a7f3d05159d803de8e6fd5019bfe2 Mon Sep 17 00:00:00 2001 From: Alex <1221721+atcol@users.noreply.github.com> Date: Tue, 19 Aug 2025 15:28:13 +0100 Subject: [PATCH 11/12] docs: non-nightly link for Iceberg Co-authored-by: Renjie Liu --- website/src/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/src/introduction.md b/website/src/introduction.md index 9edf5d2ee6..aad3f16747 100644 --- a/website/src/introduction.md +++ b/website/src/introduction.md @@ -23,7 +23,7 @@ ## What is Apache Iceberg? -[Apache Iceberg](https://iceberg.apache.org/docs/nightly/) is a modern, high-performance open table format +[Apache Iceberg](https://iceberg.apache.org) is a modern, high-performance open table format for huge analytic datasets that brings SQL-like tables to processing engines including Spark, Trino, PrestoDB, Flink, Hive and Impala. Rather than being a new file type, Iceberg provides a metadata layer that sits on top of formats like Parquet From dd06d7ffecb52800adc80b0072fa9faaa5382df8 Mon Sep 17 00:00:00 2001 From: Alex <1221721+atcol@users.noreply.github.com> Date: Tue, 19 Aug 2025 19:23:40 +0100 Subject: [PATCH 12/12] docs: remove section --- website/src/introduction.md | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/website/src/introduction.md b/website/src/introduction.md index aad3f16747..769e50857c 100644 --- a/website/src/introduction.md +++ b/website/src/introduction.md @@ -26,12 +26,7 @@ [Apache Iceberg](https://iceberg.apache.org) is a modern, high-performance open table format for huge analytic datasets that brings SQL-like tables to processing engines including Spark, Trino, PrestoDB, Flink, Hive and Impala. -Rather than being a new file type, Iceberg provides a metadata layer that sits on top of formats like Parquet +Iceberg provides a metadata layer that sits on top of formats like Parquet and ORC, ensuring data is organized, accessible, and safe to work with at scale. It introduces features long expected in databases such as transactional consistency, schema evolution, and time travel into environments -where files are stored directly on systems like Amazon S3 or HDFS. - -Originally developed at Netflix, it was designed as a response to the limitations of early Hive tables, which were -essentially directories of files with only loose conventions for schema and partitioning. While this approach -enabled cheap storage of large datasets, it struggled with schema changes, concurrent writes, and efficient query -planning. +where files are stored directly on systems like Amazon S3.