Skip to content

feat(catalog): implement catalog loader for s3tables #1598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fvaleye
Copy link
Contributor

@fvaleye fvaleye commented Aug 12, 2025

Which issue does this PR close?

What changes are included in this PR?

Catalog Registry Refactor

  • Introduced a static CATALOG_REGISTRY mapping catalog type strings (e.g., "rest", "s3tables") to their corresponding builder factory closures.
  • Eliminates hard-coded catalog type handling in load() and centralizes catalog registration logic.
  • Makes it easier to add new catalog implementations in the future without touching core loader logic.

New S3 Tables Catalog Support

  • Implemented an S3 Tables Catalog Builder.
  • Added the "s3tables" catalog type to the registry in crates/catalog/loader.
  • Updated workspace configuration to include the new iceberg-catalog-s3tables crate.
  • Added comprehensive example usage in the crate documentation.

Hive metastore

  • The hive_metastore was upgraded from 0.1.0 to fix after rebasing your S3Tables feature branch because the HMS catalog and S3Tables catalog share common Rust dependencies (pilota, volo-thrift, etc.)
  • These shared dependencies created version conflicts that manifested as "derivative attribute compilation errors."
    Upgrading to hive_metastore 0.2.0 resolved these transitive dependency conflicts.

Are these changes tested?

Yes.

  • Added new unit tests for the S3 Tables catalog.

  • Expanded loader tests to:

    • Verify loading of supported catalogs via the registry.
    • Validate improved error messages listing supported catalog types.
    • Test the new ergonomic CatalogLoader API for both "rest" and "s3tables".

@fvaleye fvaleye force-pushed the feature/implement-catalog-for-s3table branch 4 times, most recently from 8df8266 to a39a467 Compare August 12, 2025 09:25
@fvaleye fvaleye changed the title Feature/implement catalog for s3table feat(catalog): implement catalog loader for s3tables Aug 12, 2025
- Add supported_types() to list registry entries
- Include supported types in unsupported-type error
@fvaleye fvaleye force-pushed the feature/implement-catalog-for-s3table branch 7 times, most recently from 19b9534 to 648a0be Compare August 12, 2025 10:34
- Update hive_metastore dependency from 0.1 to 0.2.0
- Fixes derivative attribute compilation errors after rebase
- Ensures compatibility with updated dependency tree
@fvaleye fvaleye force-pushed the feature/implement-catalog-for-s3table branch from 648a0be to 4123e4d Compare August 12, 2025 10:38
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fvaleye for this pr, generally looks good! Left some comments.

pub const S3TABLES_CATALOG_PROP_TABLE_BUCKET_ARN: &str = "table_bucket_arn";
/// S3Tables endpoint URL property
pub const S3TABLES_CATALOG_PROP_ENDPOINT_URL: &str = "endpoint_url";

/// S3Tables catalog configuration.
#[derive(Debug, TypedBuilder)]
pub struct S3TablesCatalogConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this crate private, and remove the builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make the change.

.collect();

async move {
if self.0.table_bucket_arn.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check name can't be empty string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I'll add a dedicated test condition.

/// Builder methods for [`S3TablesCatalog`].
impl S3TablesCatalogBuilder {
/// Configure the catalog with a custom endpoint URL (useful for local testing/mocking).
pub fn with_endpoint_url(mut self, endpoint_url: impl Into<String>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method allows user to config url from either prop or this method. I'm not totally against this design, but please add doc to explain the behavior when both appears, also please add tests for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll add more documentation and proper unit tests.

}

/// Configure the catalog with a table bucket ARN.
pub fn with_table_bucket_arn(mut self, table_bucket_arn: impl Into<String>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above.

@@ -73,10 +74,11 @@ faststr = "0.2.31"
fnv = "1.0.7"
fs-err = "3.1.0"
futures = "0.3"
hive_metastore = "0.1"
hive_metastore = "0.2.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to update these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hive_metastore was upgraded from 0.1.0 to address an issue after adding S3 tables in the workspace because the HMS catalog and S3Tables catalog share common Rust dependencies (pilota, volo-thrift, etc).
These shared dependencies created a version with numerous conflicts that manifested as "derivative attribute compilation errors".

Just one example among many others:

error: cannot find attribute `derivative` in this scope
      --> /Users/florian.valeye/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hive_metastore-0.1.0/src/hms.rs:201644:15
       |
201644 |             #[derivative(Default)]

Upgrading to hive_metastore 0.2.0 resolved these transitive dependency conflicts.

…tion

- Add comprehensive documentation for with_endpoint_url and with_table_bucket_arn
  methods explaining property precedence behavior
- Add validation to prevent empty or whitespace-only catalog names
- Remove unused typed-builder dependency and refactor to manual struct construction
@fvaleye fvaleye requested a review from liurenjie1024 August 15, 2025 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement catalog loader for s3table.
2 participants