Ordered compaction and inlining with full catalog integration#642
Ordered compaction and inlining with full catalog integration#642pdet merged 66 commits intoduckdb:mainfrom
Conversation
Sure! The tests that broke are here in this CI run on my fork. They were all running a query like COMMENT ON VIEW ducklake.comment_view IS 'con1';I can create a view-specific GetEntryById function if that would be better! |
Unfortunately, I believe that |
Could we achieve this with multiple connections then? Because that's also possible within sqltests |
I am not sure! If you have a spot where I can find an example, I can give it a shot. To understand the behavior, I made a Python script that kicks off 2 separate CLI processes. I found that the sort is ignored by the other process if the schema was already cached ahead of time. The good news is that the catalog DB itself continues to have the right values, but the cache does not get invalidated correctly. The flow is:
If I omit the What do you recommend I do? I've thought about 3 options, but 3 would need some help!
ducklake_set_sorted_multiprocess_add_column.py The logs get printed out for process 1 before printing out process 2, but I logged out some timestamps to show the true order: uv run ./test/ducklake_set_sorted_multiprocess_add_column.py
┌─────────┬──────────────────────┐
│ process │ sum("range") │
│ varchar │ int128 │
├─────────┼──────────────────────┤
│ sql_1 │ 12499999997500000000 │
└─────────┴──────────────────────┘
┌─────────┬─────────────────────────────────────────────────────────────┐
│ process │ (CAST(now() AS VARCHAR) || ' sql_1 finished SET SORTED BY') │
│ varchar │ varchar │
├─────────┼─────────────────────────────────────────────────────────────┤
│ sql_1 │ 2026-01-19 19:16:39.22008-07 sql_1 finished SET SORTED BY │
└─────────┴─────────────────────────────────────────────────────────────┘
┌─────────┬─────────────────────┐
│ process │ sum("range") │
│ varchar │ int128 │
├─────────┼─────────────────────┤
│ sql_2 │ 1999999999000000000 │
└─────────┴─────────────────────┘
┌─────────┬────────────────────────────────────────────────────────────────────┐
│ process │ (CAST(now() AS VARCHAR) || ' sql_2 finished adding column') │
│ varchar │ varchar │
├─────────┼────────────────────────────────────────────────────────────────────┤
│ sql_2 │ 2026-01-19 19:16:35.416129-07 sql_2 finished adding column │
└─────────┴────────────────────────────────────────────────────────────────────┘
┌─────────┬──────────────────────┐
│ process │ sum("range") │
│ varchar │ int128 │
├─────────┼──────────────────────┤
│ sql_2 │ 40499999995500000000 │
└─────────┴──────────────────────┘
┌─────────┬───────────────────────────────────────────────────────┐
│ process │ (CAST(now() AS VARCHAR) || ' sql_2 about to compact') │
│ varchar │ varchar │
├─────────┼───────────────────────────────────────────────────────┤
│ sql_2 │ 2026-01-19 19:16:46.373997-07 sql_2 about to compact │
└─────────┴───────────────────────────────────────────────────────┘
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ 0 rows │
└─────────┘
┌─────────┬───────────┬────────────┬────────────┬─────────┐
│ process │ unique_id │ sort_key_1 │ sort_key_2 │ bonus │
│ varchar │ int64 │ int64 │ varchar │ varchar │
├─────────┼───────────┼────────────┼────────────┼─────────┤
│ sql_2 │ 3 │ 1 │ woot3 │ NULL │
│ sql_2 │ 2 │ 0 │ woot2 │ NULL │
│ sql_2 │ 1 │ 1 │ woot1 │ NULL │
│ sql_2 │ 0 │ 0 │ woot0 │ NULL │
│ sql_2 │ 7 │ 1 │ woot7 │ NULL │
│ sql_2 │ 6 │ 0 │ woot6 │ NULL │
│ sql_2 │ 5 │ 1 │ woot5 │ NULL │
│ sql_2 │ 4 │ 0 │ woot4 │ NULL │
└─────────┴───────────┴────────────┴────────────┴─────────┘
┌─────────┬─────────────┬────────────────┬────────────────────────────────────────────┐
│ process │ snapshot_id │ schema_version │ changes │
│ varchar │ int64 │ int64 │ map(varchar, varchar[]) │
├─────────┼─────────────┼────────────────┼────────────────────────────────────────────┤
│ sql_2 │ 0 │ 0 │ {schemas_created=[main]} │
│ sql_2 │ 1 │ 1 │ {tables_created=[main.sort_on_compaction]} │
│ sql_2 │ 2 │ 1 │ {tables_inserted_into=[1]} │
│ sql_2 │ 3 │ 1 │ {tables_inserted_into=[1]} │
│ sql_2 │ 4 │ 2 │ {tables_altered=[1]} │
│ sql_2 │ 5 │ 2 │ {tables_altered=[1]} │
│ sql_2 │ 6 │ 2 │ {} │
└─────────┴─────────────┴────────────────┴────────────────────────────────────────────┘
┌─────────┬──────────┬────────────────┬──────────────┬────────────────┬────────────┬────────────────┬────────────┐
│ process │ table_id │ begin_snapshot │ end_snapshot │ sort_key_index │ expression │ sort_direction │ null_order │
│ varchar │ int64 │ int64 │ int64 │ int64 │ varchar │ varchar │ varchar │
├─────────┼──────────┼────────────────┼──────────────┼────────────────┼────────────┼────────────────┼────────────┤
│ sql_2 │ 1 │ 5 │ NULL │ 0 │ sort_key_1 │ DESC │ NULLS_LAST │
│ sql_2 │ 1 │ 5 │ NULL │ 1 │ sort_key_2 │ DESC │ NULLS_LAST │
└─────────┴──────────┴────────────────┴──────────────┴────────────────┴────────────┴────────────────┴────────────┘
|
|
Hi @Alex-Monahan I had another pass on your PR, and it is looking great! I think there was a slight miscommunication issue wrt the snapshot changes. What I meant is that Sort/Comment, etc should not impact the |
Hmm, so it is ok if they increment the schema_version inside of |
I think it's fine that they increase the global counter of the schema_version, I'm really sorry for the confusion and added work! |
|
@pdet, I believe I finally understood your advice! I am now incrementing the global schema_version, but not incrementing the ducklake_schema_versions table if the only change was a comment or a set sorted. My Python test script works correctly now, and I updated the tests to focus on the ducklake_schema_versions table. I believe this is ready for another review (but CI is still running on my fork). Thank you!! |
|
Ok, the previously failing tests on my CI are now green! All I had to change were the tests: added some Sorry for being optimistic about CI being a straight shot... |
pdet
left a comment
There was a problem hiding this comment.
Hey Alex, thanks again for all the great work! I think this is pretty much ready, just had a couple of last comments.
One thing i'm also wondering is, what happens if we set the exact same order twice in a row?
E.g.,
CREATE TABLE ducklake.renamed_columns_test (unique_id INTEGER, sort_key_1 INTEGER, sort_key_2 VARCHAR);
ALTER TABLE ducklake.renamed_columns_test SET SORTED BY (sort_key_1 ASC NULLS LAST, sort_key_2 ASC NULLS LAST)
-- should this error? do nothing? add a new entry?
ALTER TABLE ducklake.renamed_columns_test SET SORTED BY (sort_key_1 ASC NULLS LAST, sort_key_2 ASC NULLS LAST).
Can you also add in the description the schema and a brief comment of the tables you added? That will make @guillesd work easier for the docs!| struct DuckLakeSortFieldInfo { | ||
| idx_t sort_key_index = 0; | ||
| // TODO: Validate that expression is case insensitive when stored | ||
| string expression; |
There was a problem hiding this comment.
We should handle the case insensitiveness for the column names in this PR already I think, and a test
CREATE TABLE t (MyColumn INT, AnotherCol VARCHAR);
ALTER TABLE t SET SORTED BY (mycolumn ASC);
ALTER TABLE t SET SORTED BY (MyColumn ASC);
There was a problem hiding this comment.
Yes, that is handled already! I added some tests and removed that outdated ToDo.
Thanks for the review! I have a test for that over in I'm happy to update the description, and @guillesd, let me know if I can be helpful on the docs! |
|
Hey @Alex-Monahan if you can provide an EDIT in the PR description just detailing a bit your final implementation details, and adding the syntax (and options if applicable). That would be great. I tagged the PR so an issue should be now linked in the ducklake docs page. |
Thank you! I've updated the description! No new options, but 2 new DuckLake spec tables. |
|
It looks like CI failed on the Docker Build step. Mind giving it a re-run? CI runs smoothly on my fork now! |
|
Thanks! |
|
Great job @Alex-Monahan @pdet - an impressive amount of work we've been following from the sidelines! |
Hi folks!
This is a new PR meant to solve the same use case as #593, but addressing the PR feedback! Thank you for the guidance - it was super helpful. I am still open to any changes you recommend!
PR Overview
The purpose of this PR is to sort data while it is written to speed up selective read queries in the future.
This uses the pre-existing DuckDB
SET SORTED BYsyntax (from here duckdb/duckdb#16714) to sort data when it is compacted or inlined data is flushed. For example,Then when either
ducklake_merge_adjacent_filesorducklake_flush_inlined_dataare called, those operations will sort the data prior to writing it out as parquet.New Tables in DuckLake Spec
This adds 2 new tables to the DuckLake spec:
ducklake_sort_infoandducklake_sort_expression.ducklake_sort_infokeeps a version history of the sort settings for tables over time. It has 1 row per time that a table has a new sort setting applied. (The prior PR used an option for this, but that has been removed based on the feedback).CREATE TABLE {METADATA_CATALOG}.ducklake_sort_info( sort_id BIGINT, table_id BIGINT, begin_snapshot BIGINT, end_snapshot BIGINT );ducklake_sort_expressiontracks the details of that sort. For each time a new sort setting is applied, this table includes one row for each expression in the order by. (If I order bycolumn3 asc, column42 desc, column64 asc, then there will be 3 rows.)CREATE TABLE {METADATA_CATALOG}.ducklake_sort_expression( sort_id BIGINT, table_id BIGINT, sort_key_index BIGINT, -- The sequence the SORTED BY expressions are evaluated in expression VARCHAR, dialect VARCHAR, sort_direction VARCHAR, -- ASC or DESC null_order VARCHAR -- NULLS_LAST or NULLS_FIRST );Future Work / Limitations
There are still a few limitations with this PR:
ORDER BYduring an insert, so there is a workaround for the momentexpression, so the intention was to make the spec itself forwards-compatible with expression-oriented sorting.I believe that I made this fully compatible with the batching code, but I was testing locally on a DuckDB catalog and not on Postgres. Any extra eyes on that side would be great!
If this looks good, I can also do any docs PRs that you recommend - happy to help there.
Thanks folks! CC @philippmd as well as an FYI