Skip to content

planner: fix gen-col mistakenly resolved after duplicate expression index substitution #67692

Open
AilinKid wants to merge 4 commits intopingcap:masterfrom
AilinKid:codex/fix-issue-67552
Open

planner: fix gen-col mistakenly resolved after duplicate expression index substitution #67692
AilinKid wants to merge 4 commits intopingcap:masterfrom
AilinKid:codex/fix-issue-67552

Conversation

@AilinKid
Copy link
Copy Markdown
Contributor

@AilinKid AilinKid commented Apr 10, 2026

What problem does this PR solve?

Issue Number: ref #67552

Problem Summary:

When a table has multiple expression indexes backed by different hidden generated columns but sharing the same virtual expression, virtual-expression index resolution may bind an expression to the wrong hidden column. This can produce an invalid IndexLookUp plan and trigger errors such as Unexpected missing column 12 on real TiKV.

What changed and how does it work?

This PR updates virtual-expression column resolution to prefer the exact column identity before falling back to expression equality.

Column.resolveIndicesByVirtualExpr now checks EqualColumn first, so a column whose UniqueID exists in the selected schema resolves to that exact column even if another schema column with the same VirtualExpr appears earlier. If there is no exact column match, it falls back to expression equality only when that fallback identifies exactly one candidate; ambiguous fallback candidates remain unresolved instead of choosing the first match.

A regression unit test covers the tie-breaking behavior by constructing two columns with the same VirtualExpr and asserting that resolution chooses the exact UniqueID match instead of the earlier expression-equal column. The same test also verifies that a target without an exact match does not resolve through an ambiguous expression-equality fallback. The planner issue test keeps coverage for the reported query shape and verifies that the expression index path is still selected.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Manual test steps:

  • tiup playground nightly --db.binpath=/private/tmp/tidb-issue67552-clean/bin/tidb-server --tiflash=0

mysql> use test
Database changed
mysql> CREATE DATABASE IF NOT EXISTS test;
Query OK, 0 rows affected, 1 warning (0.001 sec)

mysql> USE test;
Database changed
mysql> DROP TABLE IF EXISTS space;
Query OK, 0 rows affected, 1 warning (0.000 sec)

mysql> 
mysql> CREATE TABLE `space` (
    ->   `workspace_id` BINARY(16) NOT NULL,
    ->   `tenant_id` VARCHAR(100) NOT NULL,
    ->   `id` BINARY(16) NOT NULL,
    ->   `name` VARCHAR(200) NOT NULL,
    ->   `description` VARCHAR(500),
    ->   `created_at` DATETIME(6) NOT NULL,
    ->   `updated_at` DATETIME(6),
    ->   `created_by` VARCHAR(128) NOT NULL,
    ->   `updated_by` VARCHAR(128),
    ->   `status` ENUM('CURRENT', 'ARCHIVED', 'TRASHED') NOT NULL,
    ->   `is_default` BOOLEAN NOT NULL DEFAULT FALSE,
    ->   PRIMARY KEY (`workspace_id`, `id`)
    -> );
Query OK, 0 rows affected (0.024 sec)

mysql> 
mysql> CREATE INDEX `space_default_idx` ON `space`(`is_default`);
Query OK, 0 rows affected (0.314 sec)

mysql> CREATE INDEX `space_workspace_id_lower_name_id_idx` ON `space`(`workspace_id`, (LOWER(`name`)), `id`);
Query OK, 0 rows affected (0.106 sec)

mysql> CREATE INDEX `space_workspace_id_status_id_idx` ON `space`(`workspace_id`, `status`, `id`);
Query OK, 0 rows affected (0.105 sec)

mysql> CREATE INDEX `space_workspace_id_status_lower_name_id_idx` ON `space`(`workspace_id`, `status`, (LOWER(`name`)), `id`);
Query OK, 0 rows affected (0.114 sec)

mysql> INSERT INTO `space` (`tenant_id`,`workspace_id`,`id`,`name`,`description`,`created_at`,`updated_at`,`created_by`,`updated_by`,`status`,`is_default`) VALUES
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'cf579b6fb24742c3a4b05acd826495a7','Backend Team 0','d','2026-04-01 03:16:56.079169',NULL,'u1',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'19741537d8b2425083cea7b328f390e2','Backend Team 1','d','2026-04-01 03:16:56.123272',NULL,'u2',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'e203f36cdbce446cb7d87f079eaa259b','Backend Team 2','d','2026-04-01 03:16:56.128159',NULL,'u3',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'1a0605387e4f4d64806e8aaac75eda64','Backend Team 3','d','2026-04-01 03:16:56.132970',NULL,'u4',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'15a621fcfd984769918c0feeb604dcc8','Backend Team 4','d','2026-04-01 03:16:56.136578',NULL,'u5',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'6b80da36287b440fb4d010b430f712a6','Backend Archive 0','d','2026-04-01 03:16:56.140175',NULL,'u6',NULL,'ARCHIVED',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'04f61df02afe48b38ba08aef53b5a670','Backend Archive 1','d','2026-04-01 03:16:56.144300',NULL,'u7',NULL,'ARCHIVED',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'6c908281d33e4487ae74b25c34d81d3e','Backend Archive 2','d','2026-04-01 03:16:56.148197',NULL,'u8',NULL,'ARCHIVED',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'fb95f9945b8b4a1198a51c29e93edd08','Frontend Team 0','d','2026-04-01 03:16:56.152013',NULL,'u9',NULL,'CURRENT',0),
    -> ('00000000-0000-0000-0000-000000000001',x'b1c7f6cfae0f4d4791b5cf04f6a3beeb',x'f63cff86469a4384b93e4960fb9cb242','Frontend Team 1','d','2026-04-01 03:16:56.155897',NULL,'u10',NULL,'CURRENT',0);
Query OK, 10 rows affected (0.002 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> SELECT workspace_id, tenant_id, id, name, description, created_at, updated_at, created_by, updated_by, status, is_default
    -> FROM space use index(space_workspace_id_lower_name_id_idx) 
    -> WHERE workspace_id = x'b1c7f6cfae0f4d4791b5cf04f6a3beeb'
    ->   AND ((LOWER(name) > 'backend team 1') OR (LOWER(name) = 'backend team 1' AND id > x'19741537d8b2425083cea7b328f390e2'))
    ->   AND LOWER(name) LIKE '%backend%'
    ->   AND status = 'CURRENT'
    -> ORDER BY LOWER(name), id
    -> LIMIT 3;
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| workspace_id                       | tenant_id                            | id                                 | name           | description | created_at                 | updated_at | created_by | updated_by | status  | is_default |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0xE203F36CDBCE446CB7D87F079EAA259B | Backend Team 2 | d           | 2026-04-01 03:16:56.128159 | NULL       | u3         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x1A0605387E4F4D64806E8AAAC75EDA64 | Backend Team 3 | d           | 2026-04-01 03:16:56.132970 | NULL       | u4         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x15A621FCFD984769918C0FEEB604DCC8 | Backend Team 4 | d           | 2026-04-01 03:16:56.136578 | NULL       | u5         | NULL       | CURRENT |          0 |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
3 rows in set (0.003 sec)

mysql> SELECT workspace_id, tenant_id, id, name, description, created_at, updated_at, created_by, updated_by, status, is_default
    -> FROM space use index(space_workspace_id_status_lower_name_id_idx) 
    -> WHERE workspace_id = x'b1c7f6cfae0f4d4791b5cf04f6a3beeb'
    ->   AND ((LOWER(name) > 'backend team 1') OR (LOWER(name) = 'backend team 1' AND id > x'19741537d8b2425083cea7b328f390e2'))
    ->   AND LOWER(name) LIKE '%backend%'
    ->   AND status = 'CURRENT'
    -> ORDER BY LOWER(name), id
    -> LIMIT 3;
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| workspace_id                       | tenant_id                            | id                                 | name           | description | created_at                 | updated_at | created_by | updated_by | status  | is_default |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0xE203F36CDBCE446CB7D87F079EAA259B | Backend Team 2 | d           | 2026-04-01 03:16:56.128159 | NULL       | u3         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x1A0605387E4F4D64806E8AAAC75EDA64 | Backend Team 3 | d           | 2026-04-01 03:16:56.132970 | NULL       | u4         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x15A621FCFD984769918C0FEEB604DCC8 | Backend Team 4 | d           | 2026-04-01 03:16:56.136578 | NULL       | u5         | NULL       | CURRENT |          0 |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
3 rows in set (0.002 sec)

mysql> SELECT workspace_id, tenant_id, id, name, description, created_at, updated_at, created_by, updated_by, status, is_default
    -> FROM space use index(space_workspace_id_lower_name_id_idx)  
    -> WHERE workspace_id = x'b1c7f6cfae0f4d4791b5cf04f6a3beeb'
    ->   AND ((LOWER(name) > 'backend team 1') OR (LOWER(name) = 'backend team 1' AND id > x'19741537d8b2425083cea7b328f390e2'))
    ->   AND LOWER(name) LIKE '%backend%'
    ->   AND status = 'CURRENT'
    -> ORDER BY LOWER(name), id
    -> LIMIT 3;
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| workspace_id                       | tenant_id                            | id                                 | name           | description | created_at                 | updated_at | created_by | updated_by | status  | is_default |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0xE203F36CDBCE446CB7D87F079EAA259B | Backend Team 2 | d           | 2026-04-01 03:16:56.128159 | NULL       | u3         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x1A0605387E4F4D64806E8AAAC75EDA64 | Backend Team 3 | d           | 2026-04-01 03:16:56.132970 | NULL       | u4         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x15A621FCFD984769918C0FEEB604DCC8 | Backend Team 4 | d           | 2026-04-01 03:16:56.136578 | NULL       | u5         | NULL       | CURRENT |          0 |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
3 rows in set (0.002 sec)

mysql> SELECT workspace_id, tenant_id, id, name, description, created_at, updated_at, created_by, updated_by, status, is_default
    -> FROM space use index(space_workspace_id_status_lower_name_id_idx)  
    -> WHERE workspace_id = x'b1c7f6cfae0f4d4791b5cf04f6a3beeb'
    ->   AND ((LOWER(name) > 'backend team 1') OR (LOWER(name) = 'backend team 1' AND id > x'19741537d8b2425083cea7b328f390e2'))
    ->   AND LOWER(name) LIKE '%backend%'
    ->   AND status = 'CURRENT'
    -> ORDER BY LOWER(name), id
    -> LIMIT 3;
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| workspace_id                       | tenant_id                            | id                                 | name           | description | created_at                 | updated_at | created_by | updated_by | status  | is_default |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0xE203F36CDBCE446CB7D87F079EAA259B | Backend Team 2 | d           | 2026-04-01 03:16:56.128159 | NULL       | u3         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x1A0605387E4F4D64806E8AAAC75EDA64 | Backend Team 3 | d           | 2026-04-01 03:16:56.132970 | NULL       | u4         | NULL       | CURRENT |          0 |
| 0xB1C7F6CFAE0F4D4791B5CF04F6A3BEEB | 00000000-0000-0000-0000-000000000001 | 0x15A621FCFD984769918C0FEEB604DCC8 | Backend Team 4 | d           | 2026-04-01 03:16:56.136578 | NULL       | u5         | NULL       | CURRENT |          0 |
+------------------------------------+--------------------------------------+------------------------------------+----------------+-------------+----------------------------+------------+------------+------------+---------+------------+
3 rows in set (0.001 sec)



Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix an issue where virtual-expression resolution could choose the wrong hidden generated column when multiple expression indexes shared the same expression.

Summary by CodeRabbit

  • Bug Fixes

    • Improved column index resolution for virtual expressions to ensure correct index selection during query planning.
  • Tests

    • Added regression and unit tests covering index selection and virtual-expression resolution tie-breaking.
  • Chores

    • Adjusted test sharding configuration and test build settings to refine test parallelization and dependencies.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 10, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-triage-completed sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 10, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 10, 2026

Hi @AilinKid. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@AilinKid AilinKid marked this pull request as ready for review April 10, 2026 09:14
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Column virtual-expression index resolution now prefers exact column-ID matches and selects a unique fallback only when unambiguous; added tests for this behavior and a planner regression EXPLAIN test; two Bazel test rule tweaks (shard_count and test deps) updated.

Changes

Cohort / File(s) Summary
Virtual Expression Resolution Logic
pkg/expression/column.go
Modified Column.resolveIndicesByVirtualExpr to prefer EqualColumn exact-ID matches and to record a single fallback from EqualByExprAndID, assigning it only when unique.
Unit & Regression Tests
pkg/expression/column_test.go, pkg/planner/core/issuetest/planner_issue_test.go
Added a subtest asserting resolution prefers exact UniqueID matches; added regression test ambiguous-expression-index-generated-column-substitution that creates indexes and asserts IndexLookUp uses index:space_workspace_id_status_lower_name_id_idx via EXPLAIN format='plan_tree'.
Build/test config
br/pkg/metautil/BUILD.bazel, pkg/importsdk/BUILD.bazel
Test config changes: increased metautil_test shard_count from 1315; added //pkg/parser/ast to importsdk_test deps.

Sequence Diagram(s)

(Skipped — changes are local logic and tests; do not meet criteria for a sequence diagram.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • qw4990

Poem

🐰 I hop through columns, sniffing clues,
I favor IDs when choices bruise,
A single fallback, tidy and neat,
Tests nod approval — code feels complete,
Small hops, big fixes, carrot-infused news.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing an issue where generated columns were mistakenly resolved after duplicate expression index substitution. It is concise and directly related to the core bug being fixed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively documents the problem, solution, testing approach, and includes a detailed release note.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/issuetest/planner_issue_test.go`:
- Around line 35-45: The test only inspects the first LogicalSelection found by
findFirstLogicalSelection, which misses regressions in other nodes (e.g.,
LogicalSort); update the test to traverse the entire optimized plan (not just
the first match) and assert the transformed expressions for both selection and
sort nodes—either replace findFirstLogicalSelection with a recursive walker that
collects all LogicalSelection and LogicalSort nodes or add explicit assertions
for logicalop.LogicalSort expressions (and similarly update the other
occurrences referenced around lines 219-227) so the test validates ORDER BY
LOWER(name), id is rewritten correctly across the whole plan.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7273ec29-376e-45d8-b590-c676c30c1189

📥 Commits

Reviewing files that changed from the base of the PR and between cb1e1e6 and f54323f.

📒 Files selected for processing (4)
  • pkg/planner/core/issuetest/BUILD.bazel
  • pkg/planner/core/issuetest/planner_issue_test.go
  • pkg/planner/core/rule_generate_column_substitute.go
  • tests/realtikvtest/addindextest4/BUILD.bazel

Comment thread pkg/planner/core/issuetest/planner_issue_test.go Outdated
Comment thread pkg/planner/core/rule_generate_column_substitute.go Outdated
duplicatedExpr = true
}
if duplicatedExpr {
*ambiguousExprs = append(*ambiguousExprs, col.VirtualExpr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the API design of Go's built-in slices package, directly returning the slice here is the standard approach.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Updated in 2c29f27. The helper now returns the updated []expression.Expression directly instead of mutating a *[]expression.Expression, which is cleaner and closer to the usual Go slice style.

Copy link
Copy Markdown
Member

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

Thanks!

Comment thread pkg/planner/core/rule_generate_column_substitute.go Outdated
Copy link
Copy Markdown
Member

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we also disable the rewrite for this case:

  create table space (
    workspace_id binary(16) not null,
    id binary(16) not null,
    name varchar(200) not null,
    status enum('CURRENT','ARCHIVED') not null,
    primary key (workspace_id, id)
  );

  create index idx_lower_name on space(workspace_id, (lower(name)), id);
  analyze table space;

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;
  -- expect: uses idx_lower_name, no Sort

  create index idx_status_lower_name on space(workspace_id, status,
  (lower(name)), id);

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;

Is this expected?

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 77.1052%. Comparing base (8bde239) to head (9196209).

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67692        +/-   ##
================================================
- Coverage   77.7686%   77.1052%   -0.6634%     
================================================
  Files          1987       1969        -18     
  Lines        550609     550790       +181     
================================================
- Hits         428201     424688      -3513     
- Misses       121488     126050      +4562     
+ Partials        920         52       -868     
Flag Coverage Δ
integration 40.8073% <90.9090%> (+1.0101%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 50.0537% <ø> (-13.0340%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AilinKid
Copy link
Copy Markdown
Contributor Author

It seems we also disable the rewrite for this case:

  create table space (
    workspace_id binary(16) not null,
    id binary(16) not null,
    name varchar(200) not null,
    status enum('CURRENT','ARCHIVED') not null,
    primary key (workspace_id, id)
  );

  create index idx_lower_name on space(workspace_id, (lower(name)), id);
  analyze table space;

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;
  -- expect: uses idx_lower_name, no Sort

  create index idx_status_lower_name on space(workspace_id, status,
  (lower(name)), id);

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;

Is this expected?

It seems we also disable the rewrite for this case:

  create table space (
    workspace_id binary(16) not null,
    id binary(16) not null,
    name varchar(200) not null,
    status enum('CURRENT','ARCHIVED') not null,
    primary key (workspace_id, id)
  );

  create index idx_lower_name on space(workspace_id, (lower(name)), id);
  analyze table space;

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;
  -- expect: uses idx_lower_name, no Sort

  create index idx_status_lower_name on space(workspace_id, status,
  (lower(name)), id);

  explain format='plan_tree'
  select id, name
  from space
  where workspace_id = x'00000000000000000000000000000001'
  order by lower(name), id
  limit 3;

Is this expected?

Thanks, I checked this case locally, and this matches what master does as well. The default plan is already TableRangeScan + TopN even before adding idx_status_lower_name, so this is not introduced by the ambiguity fix itself.

More importantly, idx_lower_name is still a valid order-preserving path: with USE_INDEX(space, idx_lower_name), the planner chooses IndexLookUp with IndexRangeScan ... keep order:true both before and after idx_status_lower_name is added.

So what changes here is the cost-based choice, not rewrite availability. In this reproducer the table is empty and the plan is using pseudo stats; idx_lower_name is also a non-covering index for select id, name, so the optimizer prefers the cheaper TableRangeScan + TopN path by default.

@AilinKid
Copy link
Copy Markdown
Contributor Author

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 13, 2026

@AilinKid: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@0xPoe
Copy link
Copy Markdown
Member

0xPoe commented Apr 13, 2026

So what changes here is the cost-based choice, not rewrite availability. In this reproducer the table is empty and the plan is using pseudo stats; idx_lower_name is also a non-covering index for select id, name, so the optimizer prefers the cheaper TableRangeScan + TopN path by default.

I tested it with master and this patch:

Environment

  • PR head: f54323f762
  • Clean master: a83fcdb232

Repro SQL

DROP DATABASE IF EXISTS test67692;
CREATE DATABASE test67692;
USE test67692;

CREATE TABLE space (
  workspace_id BINARY(16) NOT NULL,
  id BINARY(16) NOT NULL,
  name VARCHAR(200) NOT NULL,
  status ENUM('CURRENT','ARCHIVED') NOT NULL,
  PRIMARY KEY (workspace_id, id)
);

INSERT INTO space
WITH RECURSIVE seq(n) AS (
  SELECT CAST(1 AS SIGNED)
  UNION ALL
  SELECT n + 1 FROM seq WHERE n < 200
)
SELECT
  x'00000000000000000000000000000001' AS workspace_id,
  UNHEX(LPAD(HEX(n), 32, '0')) AS id,
  CONCAT('backend-', LPAD(CAST(n AS CHAR), 3, '0')) AS name,
  IF(n % 3 = 0, 'ARCHIVED', 'CURRENT') AS status
FROM seq;

CREATE INDEX idx_lower_name ON space(workspace_id, (LOWER(name)), id);
ANALYZE TABLE space;

CREATE INDEX idx_status_lower_name ON space(workspace_id, status, (LOWER(name)), id);
ANALYZE TABLE space;

EXPLAIN FORMAT='brief'
SELECT id, name
FROM space
WHERE workspace_id = x'00000000000000000000000000000001'
ORDER BY LOWER(name), id
LIMIT 3;

Clean master a83fcdb232

Projection 3.00 root  test.space.id, test.space.name
└─IndexLookUp 3.00 root  limit embedded(offset:0, count:3)
  ├─Limit(Build) 3.00 cop[tikv]  offset:0, count:3
  │ └─IndexRangeScan 3.00 cop[tikv] table:space, index:idx_lower_name(workspace_id, lower(`name`), id) range:["\x00...\x01","\x00...\x01"], keep order:true
  └─TableRowIDScan(Probe) 3.00 cop[tikv] table:space keep order:false

PR head f54323f762

Projection 3.00 root  test.space.id, test.space.name
└─TopN 3.00 root  Column#8, test.space.id, offset:0, count:3
  └─Projection 3.00 root  test.space.workspace_id, test.space.id, test.space.name, lower(test.space.name)->Column#8
    └─TableReader 3.00 root  data:TopN
      └─TopN 3.00 cop[tikv]  lower(test.space.name), test.space.id, offset:0, count:3
        └─TableRangeScan 200.00 cop[tikv] table:space range:["\x00...\x01","\x00...\x01"], keep order:false

@AilinKid
Copy link
Copy Markdown
Contributor Author

So what changes here is the cost-based choice, not rewrite availability. In this reproducer the table is empty and the plan is using pseudo stats; idx_lower_name is also a non-covering index for select id, name, so the optimizer prefers the cheaper TableRangeScan + TopN path by default.

I tested it with master and this patch:

Environment

* PR head: `f54323f762`

* Clean master: `a83fcdb232`

Repro SQL

DROP DATABASE IF EXISTS test67692;
CREATE DATABASE test67692;
USE test67692;

CREATE TABLE space (
  workspace_id BINARY(16) NOT NULL,
  id BINARY(16) NOT NULL,
  name VARCHAR(200) NOT NULL,
  status ENUM('CURRENT','ARCHIVED') NOT NULL,
  PRIMARY KEY (workspace_id, id)
);

INSERT INTO space
WITH RECURSIVE seq(n) AS (
  SELECT CAST(1 AS SIGNED)
  UNION ALL
  SELECT n + 1 FROM seq WHERE n < 200
)
SELECT
  x'00000000000000000000000000000001' AS workspace_id,
  UNHEX(LPAD(HEX(n), 32, '0')) AS id,
  CONCAT('backend-', LPAD(CAST(n AS CHAR), 3, '0')) AS name,
  IF(n % 3 = 0, 'ARCHIVED', 'CURRENT') AS status
FROM seq;

CREATE INDEX idx_lower_name ON space(workspace_id, (LOWER(name)), id);
ANALYZE TABLE space;

CREATE INDEX idx_status_lower_name ON space(workspace_id, status, (LOWER(name)), id);
ANALYZE TABLE space;

EXPLAIN FORMAT='brief'
SELECT id, name
FROM space
WHERE workspace_id = x'00000000000000000000000000000001'
ORDER BY LOWER(name), id
LIMIT 3;

Clean master a83fcdb232

Projection 3.00 root  test.space.id, test.space.name
└─IndexLookUp 3.00 root  limit embedded(offset:0, count:3)
  ├─Limit(Build) 3.00 cop[tikv]  offset:0, count:3
  │ └─IndexRangeScan 3.00 cop[tikv] table:space, index:idx_lower_name(workspace_id, lower(`name`), id) range:["\x00...\x01","\x00...\x01"], keep order:true
  └─TableRowIDScan(Probe) 3.00 cop[tikv] table:space keep order:false

PR head f54323f762

Projection 3.00 root  test.space.id, test.space.name
└─TopN 3.00 root  Column#8, test.space.id, offset:0, count:3
  └─Projection 3.00 root  test.space.workspace_id, test.space.id, test.space.name, lower(test.space.name)->Column#8
    └─TableReader 3.00 root  data:TopN
      └─TopN 3.00 cop[tikv]  lower(test.space.name), test.space.id, offset:0, count:3
        └─TableRangeScan 200.00 cop[tikv] table:space range:["\x00...\x01","\x00...\x01"], keep order:false

thx @0xPoe got it, I also notice some test failure from unit test
image
looks like previously, we already have and assert on this ambiguous generated column index usage, simply skip this optimization will also lead this case regression.

@AilinKid AilinKid force-pushed the codex/fix-issue-67552 branch from 2c29f27 to dd05fce Compare April 20, 2026 09:35
@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/expression/column.go (1)

738-752: ⚠️ Potential issue | 🟡 Minor

Clarify intent of the two-pass scan and confirm fallback mitigation.

EqualByExprAndID already returns true when UniqueID matches, so the new first loop is a priority override: when two schema columns share the same VirtualExpr, prefer the one whose UniqueID equals the target instead of the first positional match. A one-line comment would make this non-obvious intent clear to future readers.

Regarding the fallback risk: this function is called only as a fallback recovery path (in resolve_indices.go and physical_index_reader.go) when the normal ResolveIndices call fails with an error. Upstream callers already have comments acknowledging "duplicate virtual expression column matched," indicating awareness of the ambiguity. The function's behavior when the target column's UniqueID is absent and multiple schema columns share the same VirtualExpr is that it picks the first positional match. While this is imperfect, the fallback-only invocation pattern and upstream error recovery mitigate the risk in practice.

📝 Proposed clarifying comment
 func (col *Column) resolveIndicesByVirtualExpr(ctx EvalContext, schema *Schema) bool {
+	// First prefer an exact UniqueID match so that when multiple schema columns
+	// share the same VirtualExpr (e.g. two expression indexes on LOWER(name)),
+	// we bind to the intended hidden generated column rather than the first
+	// positional match found by EqualByExprAndID.
 	for i, c := range schema.Columns {
 		if c.EqualColumn(col) {
 			col.Index = i
 			return true
 		}
 	}
 	for i, c := range schema.Columns {
 		if c.EqualByExprAndID(ctx, col) {
 			col.Index = i
 			return true
 		}
 	}
 	return false
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/expression/column.go` around lines 738 - 752, Add a one-line comment
above resolveIndicesByVirtualExpr explaining the two-pass scan: first loop
prefers exact positional match via EqualColumn to override when UniqueID matches
the target, second loop uses EqualByExprAndID (which also returns true on
UniqueID match) to fall back to expression+ID matching; also note this function
is only used as a fallback recovery path (e.g., when ResolveIndices fails) and
that if UniqueID is absent and multiple schema columns share the same
VirtualExpr the function will pick the first positional match, which is an
acknowledged imperfect fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/expression/column.go`:
- Around line 738-752: Add a one-line comment above resolveIndicesByVirtualExpr
explaining the two-pass scan: first loop prefers exact positional match via
EqualColumn to override when UniqueID matches the target, second loop uses
EqualByExprAndID (which also returns true on UniqueID match) to fall back to
expression+ID matching; also note this function is only used as a fallback
recovery path (e.g., when ResolveIndices fails) and that if UniqueID is absent
and multiple schema columns share the same VirtualExpr the function will pick
the first positional match, which is an acknowledged imperfect fallback.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9b8d3bd4-39ca-4b03-a6a3-09a9e7d0715c

📥 Commits

Reviewing files that changed from the base of the PR and between 2c29f27 and dd05fce.

📒 Files selected for processing (5)
  • br/pkg/metautil/BUILD.bazel
  • pkg/expression/column.go
  • pkg/expression/column_test.go
  • pkg/importsdk/BUILD.bazel
  • pkg/planner/core/issuetest/planner_issue_test.go
✅ Files skipped from review due to trivial changes (2)
  • pkg/importsdk/BUILD.bazel
  • br/pkg/metautil/BUILD.bazel
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/planner/core/issuetest/planner_issue_test.go

@AilinKid AilinKid changed the title planner: avoid ambiguous generated column substitution planner: fix gen-col mistakenly resolved after duplicate expression index substitution Apr 21, 2026
@AilinKid
Copy link
Copy Markdown
Contributor Author

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 21, 2026

@AilinKid: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@AilinKid
Copy link
Copy Markdown
Contributor Author

/ok-to-test

@ti-chi-bot ti-chi-bot Bot added the ok-to-test Indicates a PR is ready to be tested. label Apr 21, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 21, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-21 08:54:30.736973809 +0000 UTC m=+2069675.942333866: ☑️ agreed by qw4990.

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 21, 2026
Copy link
Copy Markdown
Member

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Overall looks good to me.

Could you please update the PR body as well? I believe it still contains the old statement explaining the previous fixes and changes.

Comment thread pkg/planner/core/issuetest/planner_issue_test.go Outdated
Comment thread pkg/expression/column.go Outdated
Comment thread pkg/expression/column.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/expression/column.go`:
- Around line 739-754: The current loop in the schema resolution records only
the first EqualByExprAndID match into fallbackIdx and uses it even if multiple
columns share the same virtual expression; update the logic in the loop that
iterates schema.Columns (the block using EqualColumn and EqualByExprAndID) to
instead collect/count all EqualByExprAndID candidates (e.g., track candidate
index and a count of matches) and only set col.Index to the fallback when the
count equals 1; keep the existing immediate return on exact EqualColumn, but
change the post-loop check to use the unique-candidate condition (count==1)
before assigning col.Index to the recorded candidate index and returning true,
otherwise treat as unresolved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 87cd31be-2274-4d31-b709-5b3f97ece979

📥 Commits

Reviewing files that changed from the base of the PR and between dd05fce and cb9f663.

📒 Files selected for processing (1)
  • pkg/expression/column.go

Comment thread pkg/expression/column.go
@AilinKid
Copy link
Copy Markdown
Contributor Author

/retest-required

@AilinKid
Copy link
Copy Markdown
Contributor Author

image

@AilinKid
Copy link
Copy Markdown
Contributor Author

/retest-required

@AilinKid AilinKid closed this Apr 23, 2026
@AilinKid AilinKid reopened this Apr 23, 2026
@AilinKid
Copy link
Copy Markdown
Contributor Author

/retest-required

@AilinKid
Copy link
Copy Markdown
Contributor Author

/test unit-test

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 23, 2026

@AilinKid: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test fast_test_tiprow
/test tidb_parser_test

Use /test all to run all jobs.

Details

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@AilinKid AilinKid force-pushed the codex/fix-issue-67552 branch from f665353 to 9196209 Compare April 23, 2026 11:01
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: qw4990
Once this PR has been reviewed and has the lgtm label, please assign zanmato1984 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-1-more-lgtm Indicates a PR needs 1 more LGTM. ok-to-test Indicates a PR is ready to be tested. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants