-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
These are notes from an email with @elphastori
I am creating this issue so we do not lose this knowledge, before being able to properly update the notes.
Do you know if Parquet is the default for Iceberg?
In short, yes. If you create an Iceberg table without specifying a format you'll almost always get Parquet but it’s not a hard-coded rule of Iceberg itself, it's chosen by the engine. Spark + Iceberg defaults to Parquet, as does Flink + Iceberg and Trino + Iceberg.
I have a few references, but please take my explanation with a grain of salt. I might be misinterpreting them:
- When you create an Iceberg table, "write.format.default" is "parquet" by default: https://iceberg.apache.org/docs/nightly/configuration/#write-properties
- When you write to the table using Spark "write-format" defaults to "write.format.default" above: https://iceberg.apache.org/docs/nightly/configuration/#write-properties
- You have a separate config on the table and the writer so you can support use cases like writing in one format like Avro to improve write performance while compacting with Parquet to improve read performance: https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/best-practices-write.html#write-file-format
- These write formats mean that your data files, the actual rows and values are Parquet files, however, I understand that Iceberg still uses Avro for the manifest files which is basically metadata for these Parquet files: https://iceberg.apache.org/spec/#manifests
Metadata
Metadata
Assignees
Labels
No labels