Skip to content

Adding column and renaming inside txn fails when inlining #740

@Alex-Monahan

Description

@Alex-Monahan

What happens?

I found an odd bug with alter table statements within a transaction when doing some testing on the next phase of sorting. This is a fairly artificial example (not a lot of scenarios where it would be needed).

If you are using inlining, adding a column and renaming it within the same transaction will cause the table to not be readable after commit (select * from table will fail).

To Reproduce

# name: test/sql/sorted_table/data_inlining_flush_sorted_transaction_expression_column_changes.test
# description: Test SET SORTED BY with expression when columns are dropped and re-added in same transaction
# group: [sorted_table]

require ducklake

require parquet

test-env DUCKLAKE_CONNECTION __TEST_DIR__/{UUID}.db

test-env DATA_PATH __TEST_DIR__


statement ok
ATTACH 'ducklake:${DUCKLAKE_CONNECTION}' AS ducklake (DATA_PATH '${DATA_PATH}/ducklake_inlining_flush_expression_column_changes', DATA_INLINING_ROW_LIMIT 10)

# Create table with VARCHAR columns suitable for concatenation expression
statement ok
CREATE TABLE ducklake.expression_column_test (unique_id INTEGER, sort_key_1 VARCHAR, sort_key_2 VARCHAR);

# Insert data (will be inlined initially)
statement ok
INSERT INTO ducklake.expression_column_test (unique_id, sort_key_1, sort_key_2)
VALUES
    (0, 'a', 'val0'),
    (1, 'c', 'val1'),
    (2, 'b', 'val2'),
    (3, 'a', 'val3'),
    (4, 'c', 'val4');

statement ok
BEGIN

statement ok
ALTER TABLE ducklake.expression_column_test ADD COLUMN sort_key_2_new VARCHAR;

statement ok
ALTER TABLE ducklake.expression_column_test RENAME COLUMN sort_key_2_new TO sort_key_3;

statement ok
CALL ducklake_flush_inlined_data('ducklake', table_name => 'expression_column_test');

statement ok
COMMIT

# This is currently erroring out with:
# Binder Error: Failed to read inlined data from DuckLake: Referenced column "sort_key_2_new" not found in FROM clause!
# Candidate bindings: "sort_key_2", "sort_key_1", "sort_key_3"

# LINE 2: ... unique_id::INTEGER, sort_key_1::VARCHAR, sort_key_2::VARCHAR, sort_key_2_new::VARCHAR, sort_key_2_new::VARCHAR
#                                                                           ^                         ^
query IIII
FROM ducklake.expression_column_test
SELECT *
----
0	a	NULL	NULL
3	a	NULL	NULL
2	b	NULL	NULL
1	c	NULL	NULL
4	c	NULL	NULL

OS:

MacOS

DuckDB Version:

1.4.4

DuckLake Version:

1fb5027 (plus some unrelated changes on my branch)

DuckDB Client:

CLI

Hardware:

No response

Full Name:

Alex Monahan

Affiliation:

MotherDuck

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a source build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions