-
Notifications
You must be signed in to change notification settings - Fork 340
feat: update pyiceberg/catalog/hive.py to support hive 4.x.x #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM this change seems to be backwards compatible. We dont currently have a way to test out hive 4.x.x though, but we can address that separately
pyiceberg/catalog/hive.py
Outdated
@@ -389,8 +389,8 @@ def _create_hive_table(self, open_client: Client, hive_table: HiveTable) -> None | |||
|
|||
def _get_hive_table(self, open_client: Client, database_name: str, table_name: str) -> HiveTable: | |||
try: | |||
return open_client.get_table(dbname=database_name, tbl_name=table_name) | |||
except NoSuchObjectException as e: | |||
return open_client.get_tables(db_name=database_name, pattern=table_name).pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like get_tables
is available in previous versions of thrift clients
i see
def get_tables(self, db_name, pattern):
"""
Parameters:
- db_name
- pattern
"""
self.send_get_tables(db_name, pattern)
return self.recv_get_tables()
in vendor/hive_metastore/ThriftHiveMetastore.py
ah looks like CI failed because we mock need to switch those to |
unfortunately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, looks like get_table_objects_by_name
is backwards compatible too.
Could you try this PR out against a hive 4.x.x deployment?
It seems like we are using hive 4.0.0 in the integration tests. We are using the hive client session_catalog_hive
in tests
https://grep.app/search?f.repo=apache%2Ficeberg-python&q=session_catalog_hive
So im confused why the tests were not failing before
oh interesting. so see the 4.0.1 changelog |
@igorvoltaic lets run the integration tests against 4.0.1! iceberg-python/dev/hive/Dockerfile Line 26 in ad8263b
|
576b70c
to
f97f978
Compare
ah we cannot do that yet. kind of a chicken and egg problem we use pyiceberg 0.9.1 to provision the hive catalog Line 66 in ad8263b
thats ok, i'll verify locally |
Looks like I found this out after getting the hive 4.0.1 run with local pyiceberg, see #2217 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cant use get_table_objects_by_name
its also removed in hive 4.0.1
ok this the expected replacement also feel free to cherry pick over #2217 |
Great catch @kevinjqliu:
To catch this, we probably want to regenerate the vendor package against 4.0.1: https://github.com/apache/iceberg-python/tree/main/vendor#vendor-packages |
Hey @Fokko, there is one already https://raw.githubusercontent.com/apache/hive/refs/heads/master/standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
@kevinjqliu thanks for your help with testing. will try to find a way to keep it backwards compatible, but not so sure yet |
ah this is the gift that keeps on giving... I've made a couple changes in #2217 to
Using hive 4.0.1 is blocked now due to apache/iceberg#12878 because spark hms connector is not compatible |
I think I got this. Made some more changes and we've tested those in our stage v4 server, moving on to production. Do we want to keep backwards compatibility with v3? |
I was not able to get our integration tests working with hive 4.0.1, see #2217. The main issue is with spark's hive catalog using deprecate APIs, apache/iceberg#12878
As long as these changes are backwards compatible with older hive clients/servers, we can make this change. But I'd need your help to ensure these changes work against hive 4.0.1 |
So the idea is to keep both vendor module versions for hive_metastore, since we'd like to keep the lib backwards compatible. Still waiting for some tests to be run thou |
resolves #1222
Rationale for this change
Starting at version 4.0.1, Hive metastore removed deprecated thrift APIs that py-iceberg is currently using. When trying to create a table with catalog.create_table_transaction using Hive metastore 4.0.1, py-iceberg raise an unexpected thrift.Thrift.TApplicationException: Invalid method name: 'get_table' error
Are there any user-facing changes?
not expected