Skip to content

Remove partial_file_info from ducklake_data_file replace it with partial_max#708

Merged
pdet merged 43 commits intoduckdb:mainfrom
pdet:partial_info_2
Jan 21, 2026
Merged

Remove partial_file_info from ducklake_data_file replace it with partial_max#708
pdet merged 43 commits intoduckdb:mainfrom
pdet:partial_info_2

Conversation

@pdet
Copy link
Collaborator

@pdet pdet commented Jan 20, 2026

This PR removes the partial_file_info column from ducklake_data_file.

This column used to store the last snapshot present in a compacted file and the rows that belonged to each snapshot. We already store information regarding row to snapshot ownership in the _ducklake_internal_snapshot_id column in the file. We can safely remove this information from the catalog, especially because partial_file_info would grow in size with the number of snapshots compacted into a file.

We still need to store the last snapshot. For this, we create a new bigint column called partial_max in ducklake_data_file.

qsliu2017 and others added 27 commits December 23, 2025 10:07
When multiple tables have inlined data at the same schema_version,
GetCatalogIdForSchema returned the first match's begin_snapshot,
which could belong to a different table. This caused "Cannot open file"
errors when reading inlined data from a fresh connection.

Added table_id parameter to GetCatalogIdForSchema and GetSnapshotForSchema
to ensure each table gets its correct begin_snapshot.
Move all queries on metadata to DuckLakeMetadataManager
Instead of joining ducklake_inlined_data_tables with ducklake_table and
filtering by schema_version, directly query ducklake_table by table_id
since table_ids are unique.
…ine-data-fix

Fix table snapshot lookup for inlined data reader
Revert duckdb#638 + run postgres/sqlite tests again
@djouallah
Copy link

@pdet do you mind clarify this as I could not find any definitive answer, is there a way to compact, force etc a table to expire all snapshots if they refer to a partial data file, otherwise, this will create a breaking point with iceberg data files, assuming one don't care about table history.

@pdet pdet merged commit fa6ebcd into duckdb:main Jan 21, 2026
27 checks passed
@pdet pdet deleted the partial_info_2 branch January 23, 2026 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants