-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(plugin-iceberg): Change iceberg default compression codec to ZSTD #26399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(plugin-iceberg): Change iceberg default compression codec to ZSTD #26399
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR updates the Iceberg connector’s default Parquet compression codec from GZIP to ZSTD to align with Iceberg 1.8.1’s new default and leverage its improved performance. Class diagram for updated IcebergConfig compression codecclassDiagram
class IcebergConfig {
IcebergFileFormat fileFormat = PARQUET
HiveCompressionCodec compressionCodec = ZSTD
CatalogType catalogType = HIVE
String catalogWarehouse
String catalogWarehouseDataDir
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Please update the doc for the default of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change. The failure tests seems to be related.
Yes, I'm checking. |
8edce0c to
6166e2e
Compare
Thanks. |
6166e2e to
03f4c3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PingLiuPing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
Description
Iceberg use GZIP as the default compression for parquet before version 1.4. See info here
And when iceberg connector was introduced to Presto, the version of Iceberg is 0.9.0. And hence it uses GZIP as the default compression codec at that time.
Now that Iceberg has changed the default compression codec to ZSTD. And the iceberg version in Presto has upgraded to 1.8.1. We should change the default compression codec to ZSTD to align with iceberg.
Moreover, from the performance test result I found that ZSTD has much better performance over GZIP. See results.
Motivation and Context
Impact
Test Plan
show session output:
The actual data file metadata:
$ parquet-tools inspect 127dbe88-372d-4872-a00f-02669277732e.parquet
############ file meta data ############
created_by:
num_columns: 2
num_rows: 2
num_row_groups: 1
format_version: 1.0
serialized_size: 175
############ Columns ############
c1
c2
############ Column(c1) ############
name: c1
path: c1
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: -42%)
############ Column(c2) ############
name: c2
path: c2
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: -33%)
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.