Skip to content

feat: adding unauthenticated HDFS storage support to catalogs (#85) #2322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rdsarvar
Copy link

@rdsarvar rdsarvar commented Aug 11, 2025

TODO - Description

Initially contains per catalog resources and username configurations that enable Polaris to configure and impersonate against an HDFS cluster. This initial unauthenticated workflow supports #85

Draft MR opened to showcase potential functionality of supporting HDFS leveraging internal catalog properties as the store of information. Please feel free to critique or suggest better implementation workflows

@snazy
Copy link
Member

snazy commented Aug 12, 2025

We have been (more precisely: are) discussing support for Hive, and implicitly HDFS, in Polaris.
There are a couple of concerns about adding (more) Hadoop stuff to Polaris.

Hadoop types like org.apache.hadoop.security.UserGroupInformation et al rely on functionality that has already been deprecated since Java 17 (JEP-411) and is already removed in Java since version 24 (JEP-486)). Although we technically do not need 24 (or the upcoming Java 25) yet, we might have to.

As we want to support these use cases, the approach would be to have a Polaris instance that federates to an instance, which can then have have hard "max Java version" restrictions but also do other "restrictive" things like Kerberos.

This all falls into the "catalog federation" bucket. We'd appreciate your input on the dev-mailing-list and in-person discussions in our biweekly community sync!

@rdsarvar
Copy link
Author

We have been (more precisely: are) discussing support for Hive, and implicitly HDFS, in Polaris. There are a couple of concerns about adding (more) Hadoop stuff to Polaris.

Hadoop types like org.apache.hadoop.security.UserGroupInformation et al rely on functionality that has already been deprecated since Java 17 (JEP-411) and is already removed in Java since version 24 (JEP-486)). Although we technically do not need 24 (or the upcoming Java 25) yet, we might have to.

As we want to support these use cases, the approach would be to have a Polaris instance that federates to an instance, which can then have have hard "max Java version" restrictions but also do other "restrictive" things like Kerberos.

This all falls into the "catalog federation" bucket. We'd appreciate your input on the dev-mailing-list and in-person discussions in our biweekly community sync!

Hi @snazy thanks for the response --

Interesting solution, so just to recap to make sure I'm on the same page the thought is to run a separate sidecar / service which itself can be pegged to a lower Java version and support any HDFS related operations there? So in essence the workflow would now look like:

  1. User makes request to Polaris catalog (ex: new table)
  2. Polaris determines the storage type is HDFS and proxies the request to the HDFS catalog service (REST API I assume)
  3. The HDFS catalog service would perform required actions on behalf of Polaris, returning what happened through structured JSON response
  4. Polaris would respond to the user and make any required persistence changes

Thus decoupling the potentially unsafe (or lesser version) Hadoop usage to users that are explicitly requiring it.

I'll join the Slack workspace, do you have reference to the channel (or thread) that this is being discussed? Or is it mostly in the syncs / mailing list

)

Initially contains per catalog resources and username configurations
that enable Polaris to configure and impersonate against an HDFS cluster
@rdsarvar rdsarvar force-pushed the add_hdfs_storage_support branch from 8b02458 to 040a855 Compare August 12, 2025 11:54
@snazy
Copy link
Member

snazy commented Aug 12, 2025

That's (pretty much) correct.
But let's keep the discussion on the dev-ML.

@rdsarvar
Copy link
Author

rdsarvar commented Aug 12, 2025

Sounds good, for reference of anyone coming into this PR it can be found here: https://lists.apache.org/thread/5qktjv6rzd8pghcl6f4oohko798o2p2g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants