planner: fix gen-col mistakenly resolved after duplicate expression index substitution by AilinKid · Pull Request #67692 · pingcap/tidb

AilinKid · 2026-04-10T09:13:49Z

What problem does this PR solve?

Issue Number: ref #67552

Problem Summary:

When a table has multiple expression indexes backed by different hidden generated columns but sharing the same virtual expression, virtual-expression index resolution may bind an expression to the wrong hidden column. This can produce an invalid IndexLookUp plan and trigger errors such as Unexpected missing column 12 on real TiKV.

What changed and how does it work?

This PR updates virtual-expression column resolution to prefer the exact column identity before falling back to expression equality.

Column.resolveIndicesByVirtualExpr now checks EqualColumn first, so a column whose UniqueID exists in the selected schema resolves to that exact column even if another schema column with the same VirtualExpr appears earlier. If there is no exact column match, it falls back to expression equality only when that fallback identifies exactly one candidate; ambiguous fallback candidates remain unresolved instead of choosing the first match.

A regression unit test covers the tie-breaking behavior by constructing two columns with the same VirtualExpr and asserting that resolution chooses the exact UniqueID match instead of the earlier expression-equal column. The same test also verifies that a target without an exact match does not resolve through an ambiguous expression-equality fallback. The planner issue test keeps coverage for the reported query shape and verifies that the expression index path is still selected.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Manual test steps:

tiup playground nightly --db.binpath=/private/tmp/tidb-issue67552-clean/bin/tidb-server --tiflash=0


mysql> use test
Database changed
mysql> CREATE DATABASE IF NOT EXISTS test;
Query OK, 0 rows affected, 1 warning (0.001 sec)

mysql> USE test;
Database changed
mysql> DROP TABLE IF EXISTS space;
Query OK, 0 rows affected, 1 warning (0.000 sec)

mysql> 
mysql> CREATE TABLE `space` (
    ->   `workspace_id` BINARY(16) NOT NULL,
    ->   `tenant_id` VARCHAR(100) NOT NULL,
    ->   `id` BINARY(16) NOT NULL,
    ->   `name` VARCHAR(200) NOT NULL,
    ->   `description` VARCHAR(500),
    ->   `created_at` DATETIME(6) NOT NULL,
    ->   `updated_at` DATETIME(6),
    ->   `created_by` VARCHAR(128) NOT NULL,
    ->   `updated_by` VARCHAR(128),
    ->   `status` ENUM('CURRENT', 'ARCHIVED', 'TRASHED') NOT NULL,
    ->   `is_default` BOOLEAN NOT NULL DEFAULT FALSE,
    ->   PRIMARY KEY (`workspace_id`, `id`)
    -> );
Query OK, 0 rows affected (0.024 sec)

mysql> 
mysql> CREATE INDEX `space_default_idx` ON `space`(`is_default`);
Query OK, 0 rows affected (0.314 sec)

mysql> CREATE INDEX `space_workspace_id_lower_name_id_idx` ON `space`(`workspace_id`, (LOWER(`name`)), `id`);
Query OK, 0 rows affected (0.106 sec)

mysql> CREATE INDEX `space_workspace_id_status_id_idx` ON `space`(`workspace_id`, `status`, `id`);
Query OK, 0 rows affected (0.105 sec)

mysql> CREATE INDEX `space_workspace_id_status_lower_name_id_idx` ON `space`(`workspace_id`, `status`, (LOWER(`name`)), `id`);
Query OK, 0 rows affected (0.114 sec)

mysql> INSERT INTO `space` (`tenant_id`,`workspace_id`,`id`,`name`,`description`,`created_at`,`updated_at`,`created_by`,`updated_by`,`status`,`is_default`) VALUES
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'cf579b6fb24742c3a4b05acd826495a7','Backend Team 0','d','2026-04-01 03:16:56.079169',NULL,'u1',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'19741537d8b2425083cea7b328f390e2','Backend Team 1','d','2026-04-01 03:16:56.123272',NULL,'u2',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'e203f36cdbce446cb7d87f079eaa259b','Backend Team 2','d','2026-04-01 03:16:56.128159',NULL,'u3',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'1a0605387e4f4d64806e8aaac75eda64','Backend Team 3','d','2026-04-01 03:16:56.132970',NULL,'u4',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'15a621fcfd984769918c0feeb604dcc8','Backend Team 4','d','2026-04-01 03:16:56.136578',NULL,'u5',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'6b80da36287b440fb4d010b430f712a6','Backend Archive 0','d','2026-04-01 03:16:56.140175',NULL,'u6',NULL,'ARCHIVED',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'04f61df02afe48b38ba08aef53b5a670','Backend Archive 1','d','2026-04-01 03:16:56.144300',NULL,'u7',NULL,'ARCHIVED',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'6c908281d33e4487ae74b25c34d81d3e','Backend Archive 2','d','2026-04-01 03:16:56.148197',NULL,'u8',NULL,'ARCHIVED',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'fb95f9945b8b4a1198a51c29e93edd08','Frontend Team 0','d','2026-04-01 03:16:56.152013',NULL,'u9',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'f63cff86469a4384b93e4960fb9cb242','Frontend Team 1','d','2026-04-01 03:16:56.155897',NULL,'u10',NULL,'CURRENT',0);
Query OK, 10 rows affected (0.002 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> SELECT workspace_id, tenant_id, id, name, description, created_at, updated_at, created_by, updated_by, status, is_default
    -> FROM space use index(space_workspace_id_lower_name_id_idx) 
    -> WHERE workspace_id = x'b1c7f6cfae0f4d4791b5cf04f6a3beeb'
    ->   AND ((LOWER(name) > 'backend team 1') OR (LOWER(name) = 'backend team 1' AND id > x'19741537d8b2425083cea7b328f390e2'))
    ->   AND LOWER(name) LIKE '%backend%'
    ->   AND status = 'CURRENT'
    -> ORDER BY LOWER(name), id
    -> LIMIT 3;
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| workspace_id                       | tenant_id                            | id                                 | name           | description | created_at                 | updated_at | created_by | updated_by | status  | is_default |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0xE203F36CDBCE446CB7D87F079EAA259B | Backend Team 2 | d           | 2026-04-01 03:16:56.128159 | NULL       | u3         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x1A0605387E4F4D64806E8AAAC75EDA64 | Backend Team 3 | d           | 2026-04-01 03:16:56.132970 | NULL       | u4         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x15A621FCFD984769918C0FEEB604DCC8 | Backend Team 4 | d           | 2026-04-01 03:16:56.136578 | NULL       | u5         | NULL       | CURRENT |          0 |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
3 rows in set (0.003 sec)

mysql> SELECT workspace_id, tenant_id, id, name, description, created_at, updated_at, created_by, updated_by, status, is_default
    -> FROM space use index(space_workspace_id_status_lower_name_id_idx) 
    -> WHERE workspace_id = x'b1c7f6cfae0f4d4791b5cf04f6a3beeb'
    ->   AND ((LOWER(name) > 'backend team 1') OR (LOWER(name) = 'backend team 1' AND id > x'19741537d8b2425083cea7b328f390e2'))
    ->   AND LOWER(name) LIKE '%backend%'
    ->   AND status = 'CURRENT'
    -> ORDER BY LOWER(name), id
    -> LIMIT 3;
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| workspace_id                       | tenant_id                            | id                                 | name           | description | created_at                 | updated_at | created_by | updated_by | status  | is_default |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0xE203F36CDBCE446CB7D87F079EAA259B | Backend Team 2 | d           | 2026-04-01 03:16:56.128159 | NULL       | u3         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x1A0605387E4F4D64806E8AAAC75EDA64 | Backend Team 3 | d           | 2026-04-01 03:16:56.132970 | NULL       | u4         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x15A621FCFD984769918C0FEEB604DCC8 | Backend Team 4 | d           | 2026-04-01 03:16:56.136578 | NULL       | u5         | NULL       | CURRENT |          0 |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
3 rows in set (0.002 sec)

mysql> SELECT workspace_id, tenant_id, id, name, description, created_at, updated_at, created_by, updated_by, status, is_default
    -> FROM space use index(space_workspace_id_lower_name_id_idx)  
    -> WHERE workspace_id = x'b1c7f6cfae0f4d4791b5cf04f6a3beeb'
    ->   AND ((LOWER(name) > 'backend team 1') OR (LOWER(name) = 'backend team 1' AND id > x'19741537d8b2425083cea7b328f390e2'))
    ->   AND LOWER(name) LIKE '%backend%'
    ->   AND status = 'CURRENT'
    -> ORDER BY LOWER(name), id
    -> LIMIT 3;
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| workspace_id                       | tenant_id                            | id                                 | name           | description | created_at                 | updated_at | created_by | updated_by | status  | is_default |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0xE203F36CDBCE446CB7D87F079EAA259B | Backend Team 2 | d           | 2026-04-01 03:16:56.128159 | NULL       | u3         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x1A0605387E4F4D64806E8AAAC75EDA64 | Backend Team 3 | d           | 2026-04-01 03:16:56.132970 | NULL       | u4         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x15A621FCFD984769918C0FEEB604DCC8 | Backend Team 4 | d           | 2026-04-01 03:16:56.136578 | NULL       | u5         | NULL       | CURRENT |          0 |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
3 rows in set (0.002 sec)

mysql> SELECT workspace_id, tenant_id, id, name, description, created_at, updated_at, created_by, updated_by, status, is_default
    -> FROM space use index(space_workspace_id_status_lower_name_id_idx)  
    -> WHERE workspace_id = x'b1c7f6cfae0f4d4791b5cf04f6a3beeb'
    ->   AND ((LOWER(name) > 'backend team 1') OR (LOWER(name) = 'backend team 1' AND id > x'19741537d8b2425083cea7b328f390e2'))
    ->   AND LOWER(name) LIKE '%backend%'
    ->   AND status = 'CURRENT'
    -> ORDER BY LOWER(name), id
    -> LIMIT 3;
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| workspace_id                       | tenant_id                            | id                                 | name           | description | created_at                 | updated_at | created_by | updated_by | status  | is_default |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0xE203F36CDBCE446CB7D87F079EAA259B | Backend Team 2 | d           | 2026-04-01 03:16:56.128159 | NULL       | u3         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x1A0605387E4F4D64806E8AAAC75EDA64 | Backend Team 3 | d           | 2026-04-01 03:16:56.132970 | NULL       | u4         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x15A621FCFD984769918C0FEEB604DCC8 | Backend Team 4 | d           | 2026-04-01 03:16:56.136578 | NULL       | u5         | NULL       | CURRENT |          0 |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
3 rows in set (0.001 sec)

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix an issue where virtual-expression resolution could choose the wrong hidden generated column when multiple expression indexes shared the same expression.

Summary by CodeRabbit

Bug Fixes
- Improved column index resolution for virtual expressions to ensure correct index selection during query planning.
Tests
- Added regression and unit tests covering index selection and virtual-expression resolution tie-breaking.
Chores
- Adjusted test sharding configuration and test build settings to refine test parallelization and dependencies.

ti-chi-bot · 2026-04-10T09:13:53Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

tiprow · 2026-04-10T09:14:09Z

Hi @AilinKid. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-04-10T09:14:56Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Column virtual-expression index resolution now prefers exact column-ID matches and selects a unique fallback only when unambiguous; added tests for this behavior and a planner regression EXPLAIN test; two Bazel test rule tweaks (shard_count and test deps) updated.

Changes

Cohort / File(s)	Summary
Virtual Expression Resolution Logic `pkg/expression/column.go`	Modified `Column.resolveIndicesByVirtualExpr` to prefer `EqualColumn` exact-ID matches and to record a single fallback from `EqualByExprAndID`, assigning it only when unique.
Unit & Regression Tests `pkg/expression/column_test.go`, `pkg/planner/core/issuetest/planner_issue_test.go`	Added a subtest asserting resolution prefers exact UniqueID matches; added regression test `ambiguous-expression-index-generated-column-substitution` that creates indexes and asserts `IndexLookUp` uses `index:space_workspace_id_status_lower_name_id_idx` via `EXPLAIN format='plan_tree'`.
Build/test config `br/pkg/metautil/BUILD.bazel`, `pkg/importsdk/BUILD.bazel`	Test config changes: increased `metautil_test` `shard_count` from `13`→`15`; added `//pkg/parser/ast` to `importsdk_test` deps.

Sequence Diagram(s)

(Skipped — changes are local logic and tests; do not meet criteria for a sequence diagram.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

planner: skip pure-constant generated column substitution #67473: Related — touches generated-column virtual-expression handling and index substitution logic.
planner: migrate unit tests checking EXPLAIN output to format='plan_tree' #67479: Related — similar planner test updates using EXPLAIN format='plan_tree' and plan-tree assertions.

Suggested reviewers

qw4990

Poem

🐰 I hop through columns, sniffing clues,
I favor IDs when choices bruise,
A single fallback, tidy and neat,
Tests nod approval — code feels complete,
Small hops, big fixes, carrot-infused news.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing an issue where generated columns were mistakenly resolved after duplicate expression index substitution. It is concise and directly related to the core bug being fixed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description comprehensively documents the problem, solution, testing approach, and includes a detailed release note.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/issuetest/planner_issue_test.go`:
- Around line 35-45: The test only inspects the first LogicalSelection found by
findFirstLogicalSelection, which misses regressions in other nodes (e.g.,
LogicalSort); update the test to traverse the entire optimized plan (not just
the first match) and assert the transformed expressions for both selection and
sort nodes—either replace findFirstLogicalSelection with a recursive walker that
collects all LogicalSelection and LogicalSort nodes or add explicit assertions
for logicalop.LogicalSort expressions (and similarly update the other
occurrences referenced around lines 219-227) so the test validates ORDER BY
LOWER(name), id is rewritten correctly across the whole plan.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7273ec29-376e-45d8-b590-c676c30c1189

📥 Commits

Reviewing files that changed from the base of the PR and between cb1e1e6 and f54323f.

📒 Files selected for processing (4)

pkg/planner/core/issuetest/BUILD.bazel
pkg/planner/core/issuetest/planner_issue_test.go
pkg/planner/core/rule_generate_column_substitute.go
tests/realtikvtest/addindextest4/BUILD.bazel

hawkingrei · 2026-04-10T09:34:50Z

+		duplicatedExpr = true
+	}
+	if duplicatedExpr {
+		*ambiguousExprs = append(*ambiguousExprs, col.VirtualExpr)


Following the API design of Go's built-in slices package, directly returning the slice here is the standard approach.

Good point. Updated in 2c29f27. The helper now returns the updated []expression.Expression directly instead of mutating a *[]expression.Expression, which is cleaner and closer to the usual Go slice style.

0xPoe

rest LGTM

Thanks!

0xPoe

It seems we also disable the rewrite for this case:

  create table space (
    workspace_id binary(16) not null,
    id binary(16) not null,
    name varchar(200) not null,
    status enum('CURRENT','ARCHIVED') not null,
    primary key (workspace_id, id)
  );

  create index idx_lower_name on space(workspace_id, (lower(name)), id);
  analyze table space;

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;
  -- expect: uses idx_lower_name, no Sort

  create index idx_status_lower_name on space(workspace_id, status,
  (lower(name)), id);

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;

Is this expected?

codecov · 2026-04-10T13:58:19Z

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 77.1052%. Comparing base (8bde239) to head (9196209).

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #67692        +/-   ##
================================================
- Coverage   77.7686%   77.1052%   -0.6634%     
================================================
  Files          1987       1969        -18     
  Lines        550609     550790       +181     
================================================
- Hits         428201     424688      -3513     
- Misses       121488     126050      +4562     
+ Partials        920         52       -868

Flag	Coverage Δ
integration	`40.8073% <90.9090%> (+1.0101%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`60.4888% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`50.0537% <ø> (-13.0340%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

AilinKid · 2026-04-13T08:08:40Z

It seems we also disable the rewrite for this case:

  create table space (
    workspace_id binary(16) not null,
    id binary(16) not null,
    name varchar(200) not null,
    status enum('CURRENT','ARCHIVED') not null,
    primary key (workspace_id, id)
  );

  create index idx_lower_name on space(workspace_id, (lower(name)), id);
  analyze table space;

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;
  -- expect: uses idx_lower_name, no Sort

  create index idx_status_lower_name on space(workspace_id, status,
  (lower(name)), id);

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;

Is this expected?

It seems we also disable the rewrite for this case:

  create table space (
    workspace_id binary(16) not null,
    id binary(16) not null,
    name varchar(200) not null,
    status enum('CURRENT','ARCHIVED') not null,
    primary key (workspace_id, id)
  );

  create index idx_lower_name on space(workspace_id, (lower(name)), id);
  analyze table space;

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;
  -- expect: uses idx_lower_name, no Sort

  create index idx_status_lower_name on space(workspace_id, status,
  (lower(name)), id);

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;

Is this expected?

Thanks, I checked this case locally, and this matches what master does as well. The default plan is already TableRangeScan + TopN even before adding idx_status_lower_name, so this is not introduced by the ambiguity fix itself.

More importantly, idx_lower_name is still a valid order-preserving path: with USE_INDEX(space, idx_lower_name), the planner chooses IndexLookUp with IndexRangeScan ... keep order:true both before and after idx_status_lower_name is added.

So what changes here is the cost-based choice, not rewrite availability. In this reproducer the table is empty and the plan is using pseudo stats; idx_lower_name is also a non-covering index for select id, name, so the optimizer prefers the cheaper TableRangeScan + TopN path by default.

AilinKid · 2026-04-13T08:24:37Z

/retest-required

tiprow · 2026-04-13T08:25:00Z

@AilinKid: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

0xPoe · 2026-04-13T14:02:04Z

So what changes here is the cost-based choice, not rewrite availability. In this reproducer the table is empty and the plan is using pseudo stats; idx_lower_name is also a non-covering index for select id, name, so the optimizer prefers the cheaper TableRangeScan + TopN path by default.

I tested it with master and this patch:

Environment

PR head: f54323f762
Clean master: a83fcdb232

Repro SQL

DROP DATABASE IF EXISTS test67692;
CREATE DATABASE test67692;
USE test67692;

CREATE TABLE space (
  workspace_id BINARY(16) NOT NULL,
  id BINARY(16) NOT NULL,
  name VARCHAR(200) NOT NULL,
  status ENUM('CURRENT','ARCHIVED') NOT NULL,
  PRIMARY KEY (workspace_id, id)
);

INSERT INTO space
WITH RECURSIVE seq(n) AS (
  SELECT CAST(1 AS SIGNED)
  UNION ALL
  SELECT n + 1 FROM seq WHERE n < 200
)
SELECT
  x'00000000000000000000000000000001' AS workspace_id,
  UNHEX(LPAD(HEX(n), 32, '0')) AS id,
  CONCAT('backend-', LPAD(CAST(n AS CHAR), 3, '0')) AS name,
  IF(n % 3 = 0, 'ARCHIVED', 'CURRENT') AS status
FROM seq;

CREATE INDEX idx_lower_name ON space(workspace_id, (LOWER(name)), id);
ANALYZE TABLE space;

CREATE INDEX idx_status_lower_name ON space(workspace_id, status, (LOWER(name)), id);
ANALYZE TABLE space;

EXPLAIN FORMAT='brief'
SELECT id, name
FROM space
WHERE workspace_id = x'00000000000000000000000000000001'
ORDER BY LOWER(name), id
LIMIT 3;

Clean master `a83fcdb232`

Projection 3.00 root  test.space.id, test.space.name
└─IndexLookUp 3.00 root  limit embedded(offset:0, count:3)
  ├─Limit(Build) 3.00 cop[tikv]  offset:0, count:3
  │ └─IndexRangeScan 3.00 cop[tikv] table:space, index:idx_lower_name(workspace_id, lower(`name`), id) range:["\x00...\x01","\x00...\x01"], keep order:true
  └─TableRowIDScan(Probe) 3.00 cop[tikv] table:space keep order:false

PR head `f54323f762`

Projection 3.00 root  test.space.id, test.space.name
└─TopN 3.00 root  Column#8, test.space.id, offset:0, count:3
  └─Projection 3.00 root  test.space.workspace_id, test.space.id, test.space.name, lower(test.space.name)->Column#8
    └─TableReader 3.00 root  data:TopN
      └─TopN 3.00 cop[tikv]  lower(test.space.name), test.space.id, offset:0, count:3
        └─TableRangeScan 200.00 cop[tikv] table:space range:["\x00...\x01","\x00...\x01"], keep order:false

AilinKid · 2026-04-14T08:30:50Z

So what changes here is the cost-based choice, not rewrite availability. In this reproducer the table is empty and the plan is using pseudo stats; idx_lower_name is also a non-covering index for select id, name, so the optimizer prefers the cheaper TableRangeScan + TopN path by default.

I tested it with master and this patch:

Environment

* PR head: `f54323f762`

* Clean master: `a83fcdb232`

Repro SQL

DROP DATABASE IF EXISTS test67692;
CREATE DATABASE test67692;
USE test67692;

CREATE TABLE space (
  workspace_id BINARY(16) NOT NULL,
  id BINARY(16) NOT NULL,
  name VARCHAR(200) NOT NULL,
  status ENUM('CURRENT','ARCHIVED') NOT NULL,
  PRIMARY KEY (workspace_id, id)
);

INSERT INTO space
WITH RECURSIVE seq(n) AS (
  SELECT CAST(1 AS SIGNED)
  UNION ALL
  SELECT n + 1 FROM seq WHERE n < 200
)
SELECT
  x'00000000000000000000000000000001' AS workspace_id,
  UNHEX(LPAD(HEX(n), 32, '0')) AS id,
  CONCAT('backend-', LPAD(CAST(n AS CHAR), 3, '0')) AS name,
  IF(n % 3 = 0, 'ARCHIVED', 'CURRENT') AS status
FROM seq;

CREATE INDEX idx_lower_name ON space(workspace_id, (LOWER(name)), id);
ANALYZE TABLE space;

CREATE INDEX idx_status_lower_name ON space(workspace_id, status, (LOWER(name)), id);
ANALYZE TABLE space;

EXPLAIN FORMAT='brief'
SELECT id, name
FROM space
WHERE workspace_id = x'00000000000000000000000000000001'
ORDER BY LOWER(name), id
LIMIT 3;

Clean master `a83fcdb232`

Projection 3.00 root  test.space.id, test.space.name
└─IndexLookUp 3.00 root  limit embedded(offset:0, count:3)
  ├─Limit(Build) 3.00 cop[tikv]  offset:0, count:3
  │ └─IndexRangeScan 3.00 cop[tikv] table:space, index:idx_lower_name(workspace_id, lower(`name`), id) range:["\x00...\x01","\x00...\x01"], keep order:true
  └─TableRowIDScan(Probe) 3.00 cop[tikv] table:space keep order:false

PR head `f54323f762`

Projection 3.00 root  test.space.id, test.space.name
└─TopN 3.00 root  Column#8, test.space.id, offset:0, count:3
  └─Projection 3.00 root  test.space.workspace_id, test.space.id, test.space.name, lower(test.space.name)->Column#8
    └─TableReader 3.00 root  data:TopN
      └─TopN 3.00 cop[tikv]  lower(test.space.name), test.space.id, offset:0, count:3
        └─TableRangeScan 200.00 cop[tikv] table:space range:["\x00...\x01","\x00...\x01"], keep order:false

thx @0xPoe got it, I also notice some test failure from unit test

looks like previously, we already have and assert on this ambiguous generated column index usage, simply skip this optimization will also lead this case regression.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/expression/column.go (1)
738-752: ⚠️ Potential issue | 🟡 Minor

Clarify intent of the two-pass scan and confirm fallback mitigation.

EqualByExprAndID already returns true when UniqueID matches, so the new first loop is a priority override: when two schema columns share the same VirtualExpr, prefer the one whose UniqueID equals the target instead of the first positional match. A one-line comment would make this non-obvious intent clear to future readers.

Regarding the fallback risk: this function is called only as a fallback recovery path (in resolve_indices.go and physical_index_reader.go) when the normal ResolveIndices call fails with an error. Upstream callers already have comments acknowledging "duplicate virtual expression column matched," indicating awareness of the ambiguity. The function's behavior when the target column's UniqueID is absent and multiple schema columns share the same VirtualExpr is that it picks the first positional match. While this is imperfect, the fallback-only invocation pattern and upstream error recovery mitigate the risk in practice.
📝 Proposed clarifying comment
 func (col *Column) resolveIndicesByVirtualExpr(ctx EvalContext, schema *Schema) bool {
+	// First prefer an exact UniqueID match so that when multiple schema columns
+	// share the same VirtualExpr (e.g. two expression indexes on LOWER(name)),
+	// we bind to the intended hidden generated column rather than the first
+	// positional match found by EqualByExprAndID.
 	for i, c := range schema.Columns {
 		if c.EqualColumn(col) {
 			col.Index = i
 			return true
 		}
 	}
 	for i, c := range schema.Columns {
 		if c.EqualByExprAndID(ctx, col) {
 			col.Index = i
 			return true
 		}
 	}
 	return false
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/expression/column.go` around lines 738 - 752, Add a one-line comment
above resolveIndicesByVirtualExpr explaining the two-pass scan: first loop
prefers exact positional match via EqualColumn to override when UniqueID matches
the target, second loop uses EqualByExprAndID (which also returns true on
UniqueID match) to fall back to expression+ID matching; also note this function
is only used as a fallback recovery path (e.g., when ResolveIndices fails) and
that if UniqueID is absent and multiple schema columns share the same
VirtualExpr the function will pick the first positional match, which is an
acknowledged imperfect fallback.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/expression/column.go`:
- Around line 738-752: Add a one-line comment above resolveIndicesByVirtualExpr
explaining the two-pass scan: first loop prefers exact positional match via
EqualColumn to override when UniqueID matches the target, second loop uses
EqualByExprAndID (which also returns true on UniqueID match) to fall back to
expression+ID matching; also note this function is only used as a fallback
recovery path (e.g., when ResolveIndices fails) and that if UniqueID is absent
and multiple schema columns share the same VirtualExpr the function will pick
the first positional match, which is an acknowledged imperfect fallback.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9b8d3bd4-39ca-4b03-a6a3-09a9e7d0715c

📥 Commits

Reviewing files that changed from the base of the PR and between 2c29f27 and dd05fce.

📒 Files selected for processing (5)

br/pkg/metautil/BUILD.bazel
pkg/expression/column.go
pkg/expression/column_test.go
pkg/importsdk/BUILD.bazel
pkg/planner/core/issuetest/planner_issue_test.go

✅ Files skipped from review due to trivial changes (2)

pkg/importsdk/BUILD.bazel
br/pkg/metautil/BUILD.bazel

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/planner/core/issuetest/planner_issue_test.go

AilinKid · 2026-04-21T07:29:25Z

/retest-required

tiprow · 2026-04-21T07:29:50Z

@AilinKid: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

AilinKid · 2026-04-21T07:30:01Z

/ok-to-test

ti-chi-bot · 2026-04-21T08:54:30Z

[LGTM Timeline notifier]

Timeline:

2026-04-21 08:54:30.736973809 +0000 UTC m=+2069675.942333866: ☑️ agreed by qw4990.

0xPoe

Thanks!

Overall looks good to me.

Could you please update the PR body as well? I believe it still contains the old statement explaining the previous fixes and changes.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/expression/column.go`:
- Around line 739-754: The current loop in the schema resolution records only
the first EqualByExprAndID match into fallbackIdx and uses it even if multiple
columns share the same virtual expression; update the logic in the loop that
iterates schema.Columns (the block using EqualColumn and EqualByExprAndID) to
instead collect/count all EqualByExprAndID candidates (e.g., track candidate
index and a count of matches) and only set col.Index to the fallback when the
count equals 1; keep the existing immediate return on exact EqualColumn, but
change the post-loop check to use the unique-candidate condition (count==1)
before assigning col.Index to the recorded candidate index and returning true,
otherwise treat as unresolved.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 87cd31be-2274-4d31-b709-5b3f97ece979

📥 Commits

Reviewing files that changed from the base of the PR and between dd05fce and cb9f663.

📒 Files selected for processing (1)

pkg/expression/column.go

AilinKid · 2026-04-22T06:36:47Z

/retest-required

AilinKid · 2026-04-23T01:37:31Z

AilinKid · 2026-04-23T01:37:48Z

/retest-required

AilinKid · 2026-04-23T07:27:04Z

/retest-required

AilinKid · 2026-04-23T09:11:24Z

/test unit-test

tiprow · 2026-04-23T09:11:48Z

@AilinKid: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test fast_test_tiprow

/test tidb_parser_test

Use /test all to run all jobs.

Details

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: AilinKid <314806019@qq.com>

ti-chi-bot · 2026-04-23T11:01:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: qw4990
Once this PR has been reviewed and has the lgtm label, please assign zanmato1984 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/expression/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

AilinKid marked this pull request as ready for review April 10, 2026 09:14

ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2026

AilinKid closed this Apr 10, 2026

AilinKid reopened this Apr 10, 2026

ti-chi-bot Bot removed the do-not-merge/needs-triage-completed label Apr 10, 2026

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread pkg/planner/core/issuetest/planner_issue_test.go Outdated

0xPoe reviewed Apr 10, 2026

View reviewed changes

Comment thread pkg/planner/core/rule_generate_column_substitute.go Outdated

hawkingrei reviewed Apr 10, 2026

View reviewed changes

0xPoe reviewed Apr 10, 2026

View reviewed changes

Comment thread pkg/planner/core/rule_generate_column_substitute.go Outdated

0xPoe reviewed Apr 10, 2026

View reviewed changes

AilinKid force-pushed the codex/fix-issue-67552 branch from 2c29f27 to dd05fce Compare April 20, 2026 09:35

ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 20, 2026

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

AilinKid changed the title ~~planner: avoid ambiguous generated column substitution~~ planner: fix gen-col mistakenly resolved after duplicate expression index substitution Apr 21, 2026

ti-chi-bot Bot added the ok-to-test Indicates a PR is ready to be tested. label Apr 21, 2026

qw4990 approved these changes Apr 21, 2026

View reviewed changes

ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 21, 2026

0xPoe reviewed Apr 21, 2026

View reviewed changes

Comment thread pkg/planner/core/issuetest/planner_issue_test.go Outdated

Comment thread pkg/expression/column.go Outdated

Comment thread pkg/expression/column.go

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread pkg/expression/column.go

AilinKid closed this Apr 23, 2026

AilinKid reopened this Apr 23, 2026

AilinKid added 4 commits April 23, 2026 19:01

expression: prefer exact ID when resolving virtual columns

c3b02ec

expression: clarify virtual column index resolution

dc70e98

expression: reject ambiguous virtual expression fallback

7b34521

.

9196209

Signed-off-by: AilinKid <314806019@qq.com>

AilinKid force-pushed the codex/fix-issue-67552 branch from f665353 to 9196209 Compare April 23, 2026 11:01

Conversation

AilinKid commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot Bot commented Apr 10, 2026

Uh oh!

tiprow Bot commented Apr 10, 2026

Uh oh!

coderabbitai Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hawkingrei Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

AilinKid Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

AilinKid commented Apr 13, 2026

Uh oh!

AilinKid commented Apr 13, 2026

Uh oh!

tiprow Bot commented Apr 13, 2026

Uh oh!

0xPoe commented Apr 13, 2026

Environment

Repro SQL

Clean master a83fcdb232

PR head f54323f762

Uh oh!

AilinKid commented Apr 14, 2026

Environment

Repro SQL

Clean master a83fcdb232

PR head f54323f762

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

AilinKid commented Apr 21, 2026

Uh oh!

tiprow Bot commented Apr 21, 2026

Uh oh!

AilinKid commented Apr 21, 2026

Uh oh!

ti-chi-bot Bot commented Apr 21, 2026

[LGTM Timeline notifier]

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

AilinKid commented Apr 10, 2026 •

edited

Loading

coderabbitai Bot commented Apr 10, 2026 •

edited

Loading

codecov Bot commented Apr 10, 2026 •

edited

Loading

Clean master `a83fcdb232`

PR head `f54323f762`

Clean master `a83fcdb232`

PR head `f54323f762`