Ordered compaction and inlining with full catalog integration by Alex-Monahan · Pull Request #642 · duckdb/ducklake

Alex-Monahan · 2025-12-22T19:00:06Z

Hi folks!

This is a new PR meant to solve the same use case as #593, but addressing the PR feedback! Thank you for the guidance - it was super helpful. I am still open to any changes you recommend!

PR Overview

The purpose of this PR is to sort data while it is written to speed up selective read queries in the future.

This uses the pre-existing DuckDB SET SORTED BY syntax (from here duckdb/duckdb#16714) to sort data when it is compacted or inlined data is flushed. For example,

ALTER TABLE ducklake.my_table SET SORTED BY (sort_key_1 ASC, sort_key_2 DESC);

Then when either ducklake_merge_adjacent_files or ducklake_flush_inlined_data are called, those operations will sort the data prior to writing it out as parquet.

New Tables in DuckLake Spec

This adds 2 new tables to the DuckLake spec: ducklake_sort_info and ducklake_sort_expression.

ducklake_sort_info keeps a version history of the sort settings for tables over time. It has 1 row per time that a table has a new sort setting applied. (The prior PR used an option for this, but that has been removed based on the feedback).

CREATE TABLE {METADATA_CATALOG}.ducklake_sort_info(
  sort_id BIGINT,
  table_id BIGINT,
  begin_snapshot BIGINT,
  end_snapshot BIGINT
);

ducklake_sort_expression tracks the details of that sort. For each time a new sort setting is applied, this table includes one row for each expression in the order by. (If I order by column3 asc, column42 desc, column64 asc, then there will be 3 rows.)

CREATE TABLE {METADATA_CATALOG}.ducklake_sort_expression(
  sort_id BIGINT,
  table_id BIGINT,
  sort_key_index BIGINT,     -- The sequence the SORTED BY expressions are evaluated in
  expression VARCHAR,
  dialect VARCHAR,
  sort_direction VARCHAR,    -- ASC or DESC
  null_order VARCHAR         -- NULLS_LAST or NULLS_FIRST
);

Future Work / Limitations

There are still a few limitations with this PR:

This does not order during insert (only during compaction and inline flush).
- I would love to try to do a follow up PR to add this!
- A user can still specify their own ORDER BY during an insert, so there is a workaround for the moment
Only explicit column names can be used in the sorting, not expressions.
- I have plans to add this in a follow up PR
- There is a friendly error message (and tests) to document this limitation.
- The spec has a column expression, so the intention was to make the spec itself forwards-compatible with expression-oriented sorting.
Files are still selected for compaction based on insertion order. It could be better to sort the list of files by min/max metadata before selecting files for compaction.
- Let me know if this is desirable and I can work on it after the "order during insert" in number 1!

I believe that I made this fully compatible with the batching code, but I was testing locally on a DuckDB catalog and not on Postgres. Any extra eyes on that side would be great!

If this looks good, I can also do any docs PRs that you recommend - happy to help there.

Thanks folks! CC @philippmd as well as an FYI

…ting.

Alex-Monahan · 2026-01-19T13:50:37Z

So, to fix the assertion issues on my fork's CI, I had to relax an assertion. Please let me know if I am off base, but I think that the assertion was too tight.
In src/storage/ducklake_catalog.cpp lines 439-461, the table_entry_map can have views added to it. However, there was an assertion in DuckLakeCatalogSet::GetEntryById(TableIndex index) that required a table (and not a view). Allowing a view there appears to solve things.

Could you point me out to which test broke this requirement?

From what I can tell, there are other parts of the code that require this to return a table, as we deference it to a DuckLakeTableEntry

e.g.,
unique_ptr<DuckLakeStats> DuckLakeCatalog::ConstructStatsMap(vector<DuckLakeGlobalStatsInfo> &global_stats,
                                                             DuckLakeCatalogSet &schema) {
	auto lake_stats = make_uniq<DuckLakeStats>();
	for (auto &stats : global_stats) {
		// find the referenced table entry
		auto table_entry = schema.GetEntryById(stats.table_id);

Sure! The tests that broke are here in this CI run on my fork. They were all running a query like

COMMENT ON VIEW ducklake.comment_view IS 'con1';

I can create a view-specific GetEntryById function if that would be better!

Alex-Monahan · 2026-01-19T17:36:22Z

As I am thinking more about this, I'm wondering if the updates I made to the catalog cache would be safe across processes. Is it safe to not update the schema_version? Could other processes use a stale sort order, or reset it back to being unsorted?

Maybe this is something we can test with concurrentloops? (see test/sql/snapshot_info/ducklake_last_commit.test)

Unfortunately, I believe that concurrentloop uses the same DuckDB instance with multiple threads, and the DuckLake catalog is only created once per ATTACH, so it is shared across all threads. I think the issue that might exist from no longer incrementing schema_version would be when two totally separate DuckLakeCatalog instances have different sort information in their cache (with the same schema_version). Is there a multi-process version of concurrentloop? Or maybe a C++ or Python test? Do you want me to remove the schema_version modifications and save them for a later PR?

pdet · 2026-01-19T19:15:18Z

As I am thinking more about this, I'm wondering if the updates I made to the catalog cache would be safe across processes. Is it safe to not update the schema_version? Could other processes use a stale sort order, or reset it back to being unsorted?

Maybe this is something we can test with concurrentloops? (see test/sql/snapshot_info/ducklake_last_commit.test)

Unfortunately, I believe that concurrentloop uses the same DuckDB instance with multiple threads, and the DuckLake catalog is only created once per ATTACH, so it is shared across all threads. I think the issue that might exist from no longer incrementing schema_version would be when two totally separate DuckLakeCatalog instances have different sort information in their cache (with the same schema_version). Is there a multi-process version of concurrentloop? Or maybe a C++ or Python test? Do you want me to remove the schema_version modifications and save them for a later PR?

Could we achieve this with multiple connections then? Because that's also possible within sqltests

Alex-Monahan · 2026-01-20T02:19:29Z

As I am thinking more about this, I'm wondering if the updates I made to the catalog cache would be safe across processes. Is it safe to not update the schema_version? Could other processes use a stale sort order, or reset it back to being unsorted?

Maybe this is something we can test with concurrentloops? (see test/sql/snapshot_info/ducklake_last_commit.test)

Unfortunately, I believe that concurrentloop uses the same DuckDB instance with multiple threads, and the DuckLake catalog is only created once per ATTACH, so it is shared across all threads. I think the issue that might exist from no longer incrementing schema_version would be when two totally separate DuckLakeCatalog instances have different sort information in their cache (with the same schema_version). Is there a multi-process version of concurrentloop? Or maybe a C++ or Python test? Do you want me to remove the schema_version modifications and save them for a later PR?

Could we achieve this with multiple connections then? Because that's also possible within sqltests

I am not sure! If you have a spot where I can find an example, I can give it a shot.

To understand the behavior, I made a Python script that kicks off 2 separate CLI processes. I found that the sort is ignored by the other process if the schema was already cached ahead of time. The good news is that the catalog DB itself continues to have the right values, but the cache does not get invalidated correctly.

The flow is:

Process 1: connects, creates the table and inserts into it
Process 2: connects and runs an ALTER TABLE ADD COLUMN, which caches the catalog
Process 1: ALTER TABLE SET SORTED BY
Process 1: Completes / exits
Process 2: Compacts (using the cached catalog)
Process 2: Pulls updated table (which does not show the right order, since the cached catalog was used)
Process 2: Completes / exits

If I omit the ALTER TABLE ADD COLUMN step in process 2, then there is no issue and the sort occurs correctly.

What do you recommend I do? I've thought about 3 options, but 3 would need some help!

Keep the schema_version from incrementing, but accept this concurrency behavior.
Allow the schema_version to increment but accept that a compaction barrier gets put in when sort is changed
Keep the schema_version from incrementing but find some other way to correctly invalidate the cache or use a different key for the cache

ducklake_set_sorted_multiprocess_add_column.py

The logs get printed out for process 1 before printing out process 2, but I logged out some timestamps to show the true order:

uv run ./test/ducklake_set_sorted_multiprocess_add_column.py
┌─────────┬──────────────────────┐
│ process │     sum("range")     │
│ varchar │        int128        │
├─────────┼──────────────────────┤
│ sql_1   │ 12499999997500000000 │
└─────────┴──────────────────────┘
┌─────────┬─────────────────────────────────────────────────────────────┐
│ process │ (CAST(now() AS VARCHAR) || ' sql_1 finished SET SORTED BY') │
│ varchar │                           varchar                           │
├─────────┼─────────────────────────────────────────────────────────────┤
│ sql_1   │ 2026-01-19 19:16:39.22008-07 sql_1 finished SET SORTED BY   │
└─────────┴─────────────────────────────────────────────────────────────┘


┌─────────┬─────────────────────┐
│ process │    sum("range")     │
│ varchar │       int128        │
├─────────┼─────────────────────┤
│ sql_2   │ 1999999999000000000 │
└─────────┴─────────────────────┘
┌─────────┬────────────────────────────────────────────────────────────────────┐
│ process │ (CAST(now() AS VARCHAR) || ' sql_2 finished adding column') │
│ varchar │                              varchar                               │
├─────────┼────────────────────────────────────────────────────────────────────┤
│ sql_2   │ 2026-01-19 19:16:35.416129-07 sql_2 finished adding column  │
└─────────┴────────────────────────────────────────────────────────────────────┘
┌─────────┬──────────────────────┐
│ process │     sum("range")     │
│ varchar │        int128        │
├─────────┼──────────────────────┤
│ sql_2   │ 40499999995500000000 │
└─────────┴──────────────────────┘
┌─────────┬───────────────────────────────────────────────────────┐
│ process │ (CAST(now() AS VARCHAR) || ' sql_2 about to compact') │
│ varchar │                        varchar                        │
├─────────┼───────────────────────────────────────────────────────┤
│ sql_2   │ 2026-01-19 19:16:46.373997-07 sql_2 about to compact  │
└─────────┴───────────────────────────────────────────────────────┘
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ 0 rows  │
└─────────┘
┌─────────┬───────────┬────────────┬────────────┬─────────┐
│ process │ unique_id │ sort_key_1 │ sort_key_2 │  bonus  │
│ varchar │   int64   │   int64    │  varchar   │ varchar │
├─────────┼───────────┼────────────┼────────────┼─────────┤
│ sql_2   │         3 │          1 │ woot3      │ NULL    │
│ sql_2   │         2 │          0 │ woot2      │ NULL    │
│ sql_2   │         1 │          1 │ woot1      │ NULL    │
│ sql_2   │         0 │          0 │ woot0      │ NULL    │
│ sql_2   │         7 │          1 │ woot7      │ NULL    │
│ sql_2   │         6 │          0 │ woot6      │ NULL    │
│ sql_2   │         5 │          1 │ woot5      │ NULL    │
│ sql_2   │         4 │          0 │ woot4      │ NULL    │
└─────────┴───────────┴────────────┴────────────┴─────────┘
┌─────────┬─────────────┬────────────────┬────────────────────────────────────────────┐
│ process │ snapshot_id │ schema_version │                  changes                   │
│ varchar │    int64    │     int64      │          map(varchar, varchar[])           │
├─────────┼─────────────┼────────────────┼────────────────────────────────────────────┤
│ sql_2   │           0 │              0 │ {schemas_created=[main]}                   │
│ sql_2   │           1 │              1 │ {tables_created=[main.sort_on_compaction]} │
│ sql_2   │           2 │              1 │ {tables_inserted_into=[1]}                 │
│ sql_2   │           3 │              1 │ {tables_inserted_into=[1]}                 │
│ sql_2   │           4 │              2 │ {tables_altered=[1]}                       │
│ sql_2   │           5 │              2 │ {tables_altered=[1]}                       │
│ sql_2   │           6 │              2 │ {}                                         │
└─────────┴─────────────┴────────────────┴────────────────────────────────────────────┘
┌─────────┬──────────┬────────────────┬──────────────┬────────────────┬────────────┬────────────────┬────────────┐
│ process │ table_id │ begin_snapshot │ end_snapshot │ sort_key_index │ expression │ sort_direction │ null_order │
│ varchar │  int64   │     int64      │    int64     │     int64      │  varchar   │    varchar     │  varchar   │
├─────────┼──────────┼────────────────┼──────────────┼────────────────┼────────────┼────────────────┼────────────┤
│ sql_2   │        1 │              5 │         NULL │              0 │ sort_key_1 │ DESC           │ NULLS_LAST │
│ sql_2   │        1 │              5 │         NULL │              1 │ sort_key_2 │ DESC           │ NULLS_LAST │
└─────────┴──────────┴────────────────┴──────────────┴────────────────┴────────────┴────────────────┴────────────┘

pdet · 2026-01-21T14:27:57Z

Hi @Alex-Monahan I had another pass on your PR, and it is looking great! I think there was a slight miscommunication issue wrt the snapshot changes. What I meant is that Sort/Comment, etc should not impact the ducklake_schema_versions table because that is used to ensure compaction of tables that have the same data-schema as in columns being the same in amount and type.

Alex-Monahan · 2026-01-21T20:59:31Z

Hi @Alex-Monahan I had another pass on your PR, and it is looking great! I think there was a slight miscommunication issue wrt the snapshot changes. What I meant is that Sort/Comment, etc should not impact the ducklake_schema_versions table because that is used to ensure compaction of tables that have the same data-schema as in columns being the same in amount and type.

Hmm, so it is ok if they increment the schema_version inside of ducklake_snapshot, just not ducklake_schema_versions? I'm having trouble detangling how to only prevent updating the schema_version selectively.

pdet · 2026-01-21T21:17:47Z

Hi @Alex-Monahan I had another pass on your PR, and it is looking great! I think there was a slight miscommunication issue wrt the snapshot changes. What I meant is that Sort/Comment, etc should not impact the ducklake_schema_versions table because that is used to ensure compaction of tables that have the same data-schema as in columns being the same in amount and type.

Hmm, so it is ok if they increment the schema_version inside of ducklake_snapshot, just not ducklake_schema_versions? I'm having trouble detangling how to only prevent updating the schema_version selectively.

I think it's fine that they increase the global counter of the schema_version, I'm really sorry for the confusion and added work!

… not.

Alex-Monahan · 2026-01-26T04:21:29Z

@pdet, I believe I finally understood your advice! I am now incrementing the global schema_version, but not incrementing the ducklake_schema_versions table if the only change was a comment or a set sorted.

My Python test script works correctly now, and I updated the tests to focus on the ducklake_schema_versions table.

I believe this is ready for another review (but CI is still running on my fork). Thank you!!

Alex-Monahan · 2026-01-26T14:57:57Z

Ok, the previously failing tests on my CI are now green! All I had to change were the tests: added some ORDER BY's for SQLite and used unique folder names so that I get accurate file counts.

Sorry for being optimistic about CI being a straight shot...

pdet

Hey Alex, thanks again for all the great work! I think this is pretty much ready, just had a couple of last comments.

One thing i'm also wondering is, what happens if we set the exact same order twice in a row?

E.g.,

CREATE TABLE ducklake.renamed_columns_test (unique_id INTEGER, sort_key_1 INTEGER, sort_key_2 VARCHAR);


ALTER TABLE ducklake.renamed_columns_test SET SORTED BY (sort_key_1 ASC NULLS LAST, sort_key_2 ASC NULLS LAST)

-- should this error? do nothing? add a new entry?
ALTER TABLE ducklake.renamed_columns_test SET SORTED BY (sort_key_1 ASC NULLS LAST, sort_key_2 ASC NULLS LAST).

Can you also add in the description the schema and a brief comment of the tables you added? That will make @guillesd work easier for the docs!

src/include/storage/ducklake_sort_data.hpp

pdet · 2026-01-26T12:47:27Z

src/include/storage/ducklake_metadata_info.hpp

+struct DuckLakeSortFieldInfo {
+	idx_t sort_key_index = 0;
+	// TODO: Validate that expression is case insensitive when stored
+	string expression;


We should handle the case insensitiveness for the column names in this PR already I think, and a test

CREATE TABLE t (MyColumn INT, AnotherCol VARCHAR); ALTER TABLE t SET SORTED BY (mycolumn ASC); ALTER TABLE t SET SORTED BY (MyColumn ASC);

Yes, that is handled already! I added some tests and removed that outdated ToDo.

Alex-Monahan · 2026-01-26T15:30:24Z

Hey Alex, thanks again for all the great work! I think this is pretty much ready, just had a couple of last comments.

One thing i'm also wondering is, what happens if we set the exact same order twice in a row?

E.g.,

CREATE TABLE ducklake.renamed_columns_test (unique_id INTEGER, sort_key_1 INTEGER, sort_key_2 VARCHAR);


ALTER TABLE ducklake.renamed_columns_test SET SORTED BY (sort_key_1 ASC NULLS LAST, sort_key_2 ASC NULLS LAST)

-- should this error? do nothing? add a new entry?
ALTER TABLE ducklake.renamed_columns_test SET SORTED BY (sort_key_1 ASC NULLS LAST, sort_key_2 ASC NULLS LAST).

Can you also add in the description the schema and a brief comment of the tables you added? That will make @guillesd work easier for the docs!

Thanks for the review! I have a test for that over in test/sql/sorted_table/merge_adjacent_sorted_repeated.test! I chose the "do nothing" path. I do a deduplication check here so that we don't add redundant entries into the catalog. Let me know if that is ok!

I'm happy to update the description, and @guillesd, let me know if I can be helpful on the docs!

guillesd · 2026-01-26T15:38:54Z

Hey @Alex-Monahan if you can provide an EDIT in the PR description just detailing a bit your final implementation details, and adding the syntax (and options if applicable). That would be great. I tagged the PR so an issue should be now linked in the ducklake docs page.

Alex-Monahan · 2026-01-26T16:09:49Z

Hey @Alex-Monahan if you can provide an EDIT in the PR description just detailing a bit your final implementation details, and adding the syntax (and options if applicable). That would be great. I tagged the PR so an issue should be now linked in the ducklake docs page.

Thank you! I've updated the description! No new options, but 2 new DuckLake spec tables.

Alex-Monahan · 2026-01-26T22:39:38Z

It looks like CI failed on the Docker Build step. Mind giving it a re-run? CI runs smoothly on my fork now!

pdet · 2026-01-27T09:19:51Z

Thanks!

redox · 2026-01-27T09:44:12Z

Great job @Alex-Monahan @pdet - an impressive amount of work we've been following from the sidelines!

Alex-Monahan added 30 commits October 4, 2025 21:33

Working hardcoded compaction sort!

eb59a29

parse order by string and manually bind.

f2b5947

approx_order_by param in merge_adjacent_files

856e479

config option for approx_order_by

3b52466

Refactor into a method on DuckLakeCompactor

3b24a22

Move compactor to .hpp, GetApproxOrderBy fn, WIP inlined order by

d670375

working tests for ducklake_flush_inlined_data!

b214ccd

Use const references. More commented out attempt to dynamically bind

295e7f5

rename parameter to local_order_by

e67d5e2

Rename fns, - comments/prints,+ negative test

87964df

Merge remote-tracking branch 'origin' into ordered-compaction

cf11b9b

Accept SET SORTED BY syntax (do nothing yet)

d12e237

SET SORTED inserts to DB! Lots of wiring...

62beb45

Working compact with sort_data instead of option! WIP cleanup.

41da46b

Update tests to use new syntax. Passing!

cf6c617

Inlining tests using new syntax working!

6bc926b

Remove local_order_by option and naming

d9e4e61

inline flush sorted within txn works. Not compaction (seems ok?). Tes…

f8b4425

…ting.

Revert rename of duckdb property

1449bdb

RESET SORTED BY works and is tested!

f55888b

If sorts match, don't insert to catalog

10d527d

Merge branch 'main' into ordered-compaction-catalog

f89e12a

Remove duplicate InsertSort fn

e3bf7a4

batch_queries for sort catalog operations

6b612cb

Remove comment block

a0a2424

retarget to duckdb main

4c13a28

try to point duckdb submodule to correct commit

61cc765

extension-ci-tools git commit hash fix

c988bed

Undo old ducklake_option edits. Edit comments

b73fc7d

Add FIXME for expressions in sort. (And re-run CI/CD)

a1a6b26

Alex-Monahan and others added 2 commits January 25, 2026 21:16

Global schema_version increments again, ducklake_schema_versions does…

7438e4a

… not.

Merge branch 'main' into ordered-compaction-catalog

048c2bf

Alex-Monahan and others added 6 commits January 25, 2026 21:51

Revert assertion relaxation

68d0ab1

Remove comment

fed8a52

Revert test change

a74e736

Fix off by 1 in test

9163702

ORDER BY in tests if joining (for SQLite catalog)

4a3808c

Test in separate data_path folders to fix file count

ac621d6

pdet reviewed Jan 26, 2026

View reviewed changes

guillesd added the Needs Documentation label Jan 26, 2026

Remove unnecessary class

959c16f

duckdblabs-bot mentioned this pull request Jan 26, 2026

[ducklake/#642] - Ordered compaction and inlining with full catalog integration needs documentation duckdb/ducklake-web#273

Open

Test case insensitive SET SORTED BY. Remove outdated ToDo

39b8e53

pdet merged commit 0b8f1cf into duckdb:main Jan 27, 2026
51 of 63 checks passed

Alex-Monahan mentioned this pull request Feb 3, 2026

Sort expressions for compaction and inline flush #743

Open

Conversation

Alex-Monahan commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview

New Tables in DuckLake Spec

Future Work / Limitations

Uh oh!

Alex-Monahan commented Jan 19, 2026

Uh oh!

Alex-Monahan commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pdet commented Jan 19, 2026

Uh oh!

Alex-Monahan commented Jan 20, 2026

Uh oh!

pdet commented Jan 21, 2026

Uh oh!

Alex-Monahan commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pdet commented Jan 21, 2026

Uh oh!

Alex-Monahan commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alex-Monahan commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pdet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pdet Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Alex-Monahan Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Alex-Monahan commented Jan 26, 2026

Uh oh!

guillesd commented Jan 26, 2026

Uh oh!

Alex-Monahan commented Jan 26, 2026

Uh oh!

Alex-Monahan commented Jan 26, 2026

Uh oh!

pdet commented Jan 27, 2026

Uh oh!

Uh oh!

redox commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Alex-Monahan commented Dec 22, 2025 •

edited

Loading

Alex-Monahan commented Jan 19, 2026 •

edited

Loading

Alex-Monahan commented Jan 21, 2026 •

edited

Loading

Alex-Monahan commented Jan 26, 2026 •

edited

Loading

Alex-Monahan commented Jan 26, 2026 •

edited

Loading