Skip to content

Support HMS Federation #2355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 18, 2025
Merged

Conversation

poojanilangekar
Copy link
Contributor

@poojanilangekar poojanilangekar commented Aug 14, 2025

Supports federating to HiveCatalog using the Iceberg REST library.
All hive dependencies are added in an independent module, i.e., polaris-extensions-federation-hive and can be removed/converted to a compile time flag if necessary.
Similar to HadoopCatalog, HMS federation support is currently restricted to IMPLICIT auth. The underlying authentication can be any form that Hive supports, however Polaris will not store and manage any of these credentials. Again, similar to HadoopCatalog, this version supports federating to a single Hive instance.

This PR relies on Polaris discovering the hive-site.xml file to get the configuration options from the classpath (including HADOOP_CONF_DIR).

The spec change has been discussed in the dev mailing list, followed by a discussion in the Polaris community sync on Aug 7, 2025.

Testing:
Modified the regression test to locally test that Hive federation works as expected. The next step would be to add a regression test once the change is baked into the Polaris docker image (for CI builds).

This PR primarily builds on #1305 and #1466. Thank you @dennishuo and @eric-maynard for helping out with this!

@poojanilangekar
Copy link
Contributor Author

CC: @dennishuo @eric-maynard

Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a regression test like the one in #2286?

@poojanilangekar
Copy link
Contributor Author

Yes, I have a test that I used locally. Unfortunately, I just can't add it to the test until we get this PR baked into the Docker image. :(

@poojanilangekar
Copy link
Contributor Author

Here is a jupyter notebook I've used in the past to demo the feature. (in lieu of the regtests that I will add after this PR is merged).

eric-maynard
eric-maynard previously approved these changes Aug 15, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Aug 15, 2025
dennishuo
dennishuo previously approved these changes Aug 15, 2025
Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though is this the case where we wanted compile-time flags to allow the Polaris user to choose to exclude the whole org.apache.hive:hive-metastore dependency, even if we chose not to use a compile-time flag for HadoopCatalog due to that being a breaking change?

String warehouse = ((HiveConnectionConfigInfoDpo) connectionConfigInfoDpo).getWarehouse();
// Unlike Hadoop, HiveCatalog does not require us to create a Configuration object, the iceberg
// rest library find the default configuration by reading hive-site.xml in the classpath
// (including HADOOP_CONF_DIR classpath).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth adding a TODO here to better qualify the assertion about not requiring Configuration and requiring hive-site.xml (and so we don't forget), that if we do want Hadoop Configuration to come from runtime Polaris properties instead, that we need to call hiveCatalog.setConf before the call to hiveCatalog.initialize and our constructed Configuration should begin with new Configuration(false) to avoid loading the default hive-site.xml?

AFAICT doing those two actions should be sufficient in potential multi-catalog environments to at least prevent basic conf-leakage between HiveCatalog instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@poojanilangekar poojanilangekar dismissed stale reviews from dennishuo and eric-maynard via 99d0867 August 15, 2025 23:50
@eric-maynard eric-maynard merged commit 7f50667 into apache:main Aug 18, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Aug 18, 2025
eric-maynard added a commit that referenced this pull request Aug 18, 2025
In #2369 Iceberg table federation was refactored around the new `IcebergRESTExternalCatalogFactory` type based on discussion in the community sync. This has unblocked the ability to federate to more non-Iceberg catalogs, such as in #2355. This PR refactors generic table federation to go through the same mechanism. After this, we can go through and implement generic table federation for the existing `IcebergRESTExternalCatalogFactory` implementations.
bmlyr pushed a commit to bmlyr/polaris that referenced this pull request Aug 19, 2025
In apache#2369 Iceberg table federation was refactored around the new `IcebergRESTExternalCatalogFactory` type based on discussion in the community sync. This has unblocked the ability to federate to more non-Iceberg catalogs, such as in apache#2355. This PR refactors generic table federation to go through the same mechanism. After this, we can go through and implement generic table federation for the existing `IcebergRESTExternalCatalogFactory` implementations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants