- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.5k
feat(plugin-iceberg): Change iceberg default compression codec to ZSTD #26399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(plugin-iceberg): Change iceberg default compression codec to ZSTD #26399
Conversation
| Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR updates the Iceberg connector’s default Parquet compression codec from GZIP to ZSTD to align with Iceberg 1.8.1’s new default and leverage its improved performance. Class diagram for updated IcebergConfig compression codecclassDiagram
    class IcebergConfig {
        IcebergFileFormat fileFormat = PARQUET
        HiveCompressionCodec compressionCodec = ZSTD
        CatalogType catalogType = HIVE
        String catalogWarehouse
        String catalogWarehouseDataDir
    }
File-Level Changes
 Tips and commandsInteracting with Sourcery
 Customizing Your ExperienceAccess your dashboard to: 
 Getting Help
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Please update the doc for the default of  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change. The failure tests seems to be related.
| 
 Yes, I'm checking. | 
8edce0c    to
    6166e2e      
    Compare
  
    | 
 Thanks. | 
6166e2e    to
    03f4c3b      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PingLiuPing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
Description
Iceberg use GZIP as the default compression for parquet before version 1.4. See info here
And when iceberg connector was introduced to Presto, the version of Iceberg is 0.9.0. And hence it uses GZIP as the default compression codec at that time.
Now that Iceberg has changed the default compression codec to ZSTD. And the iceberg version in Presto has upgraded to 1.8.1. We should change the default compression codec to ZSTD to align with iceberg.
Moreover, from the performance test result I found that ZSTD has much better performance over GZIP. See results.
Motivation and Context
Impact
Test Plan
show session output:
The actual data file metadata:
$ parquet-tools inspect 127dbe88-372d-4872-a00f-02669277732e.parquet
############ file meta data ############
created_by:
num_columns: 2
num_rows: 2
num_row_groups: 1
format_version: 1.0
serialized_size: 175
############ Columns ############
c1
c2
############ Column(c1) ############
name: c1
path: c1
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: -42%)
############ Column(c2) ############
name: c2
path: c2
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: -33%)
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.