Remove partial_file_info from ducklake_data_file replace it with partial_max#708
Merged
pdet merged 43 commits intoduckdb:mainfrom Jan 21, 2026
Merged
Remove partial_file_info from ducklake_data_file replace it with partial_max#708pdet merged 43 commits intoduckdb:mainfrom
pdet merged 43 commits intoduckdb:mainfrom
Conversation
When multiple tables have inlined data at the same schema_version, GetCatalogIdForSchema returned the first match's begin_snapshot, which could belong to a different table. This caused "Cannot open file" errors when reading inlined data from a fresh connection. Added table_id parameter to GetCatalogIdForSchema and GetSnapshotForSchema to ensure each table gets its correct begin_snapshot.
Move all queries on metadata to DuckLakeMetadataManager
Instead of joining ducklake_inlined_data_tables with ducklake_table and filtering by schema_version, directly query ducklake_table by table_id since table_ids are unique.
…ine-data-fix Fix table snapshot lookup for inlined data reader
Revert duckdb#638 + run postgres/sqlite tests again
…ency transaction error
|
@pdet do you mind clarify this as I could not find any definitive answer, is there a way to compact, force etc a table to expire all snapshots if they refer to a partial data file, otherwise, this will create a breaking point with iceberg data files, assuming one don't care about table history. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR removes the
partial_file_infocolumn fromducklake_data_file.This column used to store the last snapshot present in a compacted file and the rows that belonged to each snapshot. We already store information regarding row to snapshot ownership in the
_ducklake_internal_snapshot_idcolumn in the file. We can safely remove this information from the catalog, especially becausepartial_file_infowould grow in size with the number of snapshots compacted into a file.We still need to store the last snapshot. For this, we create a new bigint column called
partial_maxinducklake_data_file.