Skip to content

dumpling: support partition flag and generate dump sql based on partition condition#67618

Merged
ti-chi-bot[bot] merged 9 commits intopingcap:masterfrom
shiyuhang0:dumpling_support_partation_flag
Apr 21, 2026
Merged

dumpling: support partition flag and generate dump sql based on partition condition#67618
ti-chi-bot[bot] merged 9 commits intopingcap:masterfrom
shiyuhang0:dumpling_support_partation_flag

Conversation

@shiyuhang0
Copy link
Copy Markdown
Member

@shiyuhang0 shiyuhang0 commented Apr 8, 2026

What problem does this PR solve?

Issue Number: ref #67619

Also a part of this project: #67765

Problem Summary:

The export performance issue: when the user specifies the where flag to dump only one partation from a large table, dumpling performing full-table sampling (TABLESAMPLE REGIONS) to generate data chunks, which creates numerous redundant scan tasks for non-target partitions that consume TiKV resources without returning any data.

What changed and how does it work?

Adds --partitions flag so users can explicitly specify the partitions they want to export. When partitions are provided, the PARTITION condition in TABLESAMPLE REGIONS() can be pushed down to tikv, which means region sampling is performed only on the selected partitions instead of the entire table. As a result, the generated chunk SQLs are scoped to the target partitions from the beginning, avoiding unnecessary full-table sampling and reducing the chance that execution falls back to inefficient partition-wide scans caused by where filters.

Key change:

  • Add a new --partitions flag and reject it when used together with --sql.
  • Generate sequential and concurrent dump tasks per selected partition, and build partition conditionTABLESAMPLE REGIONS() queries. The partition condition in TABLESAMPLE REGIONS() can be pushed down to tikv. It only returns the regions where the partition is located.
  • Add unit tests for config validation and partition-aware query generation.

Performance

This feature has already been verified in TiDB Cloud.

Based on the test results for a customer with thousands of partitions (each ranging from 50GB to 200GB), the optimization yields the following performance breakthroughs.

before: dump a partition with where flag
after: dump a partition with partation falg

Key Improvements (24 concurrency)

  • 10x Speedup: 40M/s -> 500M/s
  • 250x RU Cost Efficiency: 300M RU -> 120w RU
  • 1/16 TiKV CPU usage: TiKV CPU pressure (cop-worker) dropped drastically from 250 cores down to just 15 cores.

Manual Test

Use ossinsight datasize in TiDB Cloud cluster

select count(*) from gharchive_dev.github_events partition(fork_apply_event) AS OF TIMESTAMP '2026-04-08 20:30:00';
+----------+
| count(*) |
+----------+
|    35751 |
+----------+

dump

./bin/dumpling -o='./p1' --filter='gharchive_dev.github_events' -r 10000 -u '3EDFHZJX5iSzvfr.xxx' -h 'gateway01.us-west-2.prod.aws.tidbcloud.com' -P 4000  -p'xxx' --partitions 'fork_apply_event' --filetype csv

result

 wc -l p1/gharchive_dev.github_events.000000001.csv
 35752 p1/gharchive_dev.github_events.000000001.csv

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Add the `--partitions` option to Dumpling so TiDB exports can split and dump only selected table partitions.

Summary by CodeRabbit

  • New Features

    • Added a --partitions CLI flag to export specific table partitions and run per-partition export tasks.
    • Partition-aware sampling is used when the server supports it to parallelize partition exports.
  • Bug Fixes

    • Prevents using --sql together with --partitions.
    • Rejects unsupported row-based dump strategies when partitions are requested.
  • Tests

    • Added tests for partition-aware sampling query generation and mutual-exclusion validation.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 8, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Apr 8, 2026

@shiyuhang0 I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added component/dumpling This is related to Dumpling of TiDB. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a --partitions CLI flag and Config.Partitions []string; normalizes and validates partition values; enforces mutual exclusion with --sql; verifies TiDB-only and minimum version for partitioned dumps; implements per-partition sequential and concurrent dump paths including TiDB TABLESAMPLE sampling per partition.

Changes

Cohort / File(s) Summary
Configuration & Validation
dumpling/export/config.go
Add --partitions flag, new Config.Partitions []string, parse and normalize partitions, and update validateSpecifiedSQL to reject --sql when partitions are set.
Dump Logic
dumpling/export/dump.go
Add upfront partition compatibility and existence checks; sequential dumper schedules per-partition dumpWholeTableDirectly; concurrent dumper routes to a new TiDB TABLESAMPLE-based path when partitions set and TiDB ≥ v5.0, add concurrentDumpTiDBPartitionTablesWithTableSample, and extend TABLESAMPLE query generation to support PARTITION(...).
Tests
dumpling/export/prepare_test.go, dumpling/export/sql_test.go
Extend TestConfigValidation to assert --sql vs --partitions mutual exclusion; add TestBuildTiDBTableSampleQuery to verify TABLESAMPLE query generation with and without PARTITION(...).

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as Config Parser
    participant Validator as Validation Layer
    participant Controller as Dump Controller
    participant DumperSeq as Sequential Dumper
    participant DumperConc as Concurrent Dumper
    participant TiDB as TiDB

    User->>CLI: provide flags (--partitions, --sql, ...)
    CLI->>Validator: parse & normalize Config
    Validator->>Validator: validate mutual exclusion, TiDB-only, version, partition existence
    alt validation fails
        Validator-->>User: error
    else
        Validator->>Controller: config OK
        alt sequential mode
            Controller->>DumperSeq: schedule per-partition dumps
            loop per partition
                DumperSeq->>TiDB: dumpWholeTableDirectly(partition)
                TiDB-->>DumperSeq: rows
            end
            DumperSeq-->>User: complete
        else concurrent mode (TiDB + TABLESAMPLE)
            Controller->>DumperConc: prepare per-partition TABLESAMPLE queries
            loop per partition
                DumperConc->>TiDB: selectTiDBTableSample(partition)
                TiDB-->>DumperConc: sampled handles
            end
            DumperConc->>DumperConc: dispatch concurrent tasks per partition
            DumperConc-->>User: complete
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

size/XXL, ok-to-test, lgtm, approved

Suggested reviewers

  • joechenrh
  • GMHDBJD
  • D3Hunter

Poem

🐰 I nibble flags and hop through rows, partitions in a line,
I sample tiny regions, stack the chunks up fine,
Per-partition hops, concurrent leaps, I drum with tiny feet,
Dumping slices, one by one—carrots, tables, what a treat! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding partition flag support and generating partition-aware dump SQL, directly matching the PR's core objectives.
Description check ✅ Passed The description comprehensively covers the problem statement, implementation details, performance metrics, manual testing, checklist completion, and release notes, meeting the template requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dumpling/export/dump.go`:
- Around line 503-505: The current pre-check using conf.ServerInfo.ServerVersion
and tableSampleVersion rejects all --partitions exports on <5.0.0; remove this
global early return and instead restrict the version requirement only where
partition-scoped chunking is used inside dumpTableData: keep the PARTITION(...)
direct export path reachable for older servers and, in the chunked-splitting
branches, gate behavior by checking
conf.ServerInfo.ServerVersion.Compare(*tableSampleVersion) and skip or alter the
partition-scoped chunking logic accordingly (i.e., bypass calling
buildConcatTask for servers < v5.0.0 and use the direct PARTITION export flow);
update the code paths that call buildConcatTask so they only do so when the
version check passes.
- Around line 422-424: The partition flag validation (d.checkPartitionsFlag
using tctx, metaConn, allTables) must run before any placement-policy tasks are
enqueued; move the call to d.checkPartitionsFlag so it executes prior to the
placement-policy task block that currently enqueues placement-policy tasks,
ensuring a config error aborts before any metadata tasks are queued.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 38d5be8e-9b15-401f-87f5-e0afe1cf8fe9

📥 Commits

Reviewing files that changed from the base of the PR and between fac0bdf and 595aeda.

📒 Files selected for processing (4)
  • dumpling/export/config.go
  • dumpling/export/dump.go
  • dumpling/export/prepare_test.go
  • dumpling/export/sql_test.go

Comment thread dumpling/export/dump.go
Comment thread dumpling/export/dump.go
@shiyuhang0 shiyuhang0 changed the title *: support dumpling partition-scoped export splitting dumpling: support dumpling partition-scoped export splitting Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 20.71429% with 111 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.4224%. Comparing base (fac0bdf) to head (a40fd84).
⚠️ Report is 56 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67618        +/-   ##
================================================
- Coverage   77.5796%   77.4224%   -0.1573%     
================================================
  Files          1980       1966        -14     
  Lines        547722     556930      +9208     
================================================
+ Hits         424921     431189      +6268     
- Misses       121991     125706      +3715     
+ Partials        810         35       -775     
Flag Coverage Δ
integration 40.8713% <ø> (+6.5316%) ⬆️
unit 76.6708% <20.7142%> (+0.3373%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <20.7142%> (-1.0178%) ⬇️
parser ∅ <ø> (∅)
br 49.4201% <ø> (-11.0710%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
dumpling/export/dump.go (2)

504-509: ⚠️ Potential issue | 🟠 Major

Version check may be overly restrictive for simple partition exports.

The check at lines 507-509 rejects all --partitions usage for TiDB < 5.0.0. However, the simple PARTITION(...) path in sequentialDumpTable (lines 771-779) doesn't require TABLESAMPLE and could work on older TiDB versions.

If the TiDB 5.0.0 requirement is specifically for partition-scoped TABLESAMPLE splitting, consider gating only that path and allowing the direct partition export for older servers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/dump.go` around lines 504 - 509, The current top-level check
rejects all --partitions usage for TiDB < v5.0.0; instead, modify the logic so
that conf.ServerInfo.ServerType still enforces TiDB for any partitions usage but
only require conf.ServerInfo.ServerVersion >= tableSampleVersion when code will
use TABLESAMPLE-based splitting; locate the version compare that references
conf.ServerInfo.ServerVersion and tableSampleVersion and move or duplicate it
into the path in sequentialDumpTable (the TABLESAMPLE branch around the
partition-scoped TABLESAMPLE logic) so the simple PARTITION(...) export path
remains allowed on older TiDB versions.

426-428: ⚠️ Potential issue | 🟡 Minor

Validate --partitions before queueing any metadata tasks.

This validation runs after placement-policy tasks (lines 401-424) can already be enqueued. A configuration error here could leave partial output behind. Consider moving the validation above the placement-policy block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/dump.go` around lines 426 - 428, The checkPartitionsFlag
validation (call: d.checkPartitionsFlag(tctx, metaConn, allTables)) must run
before any placement-policy tasks are enqueued; move this call above the
placement-policy task creation/enqueue block so a bad --partitions config fails
fast and prevents partial metadata tasks from being queued. Ensure you keep the
same parameters (tctx, metaConn, allTables) and update control flow so that
placement-policy task creation (the placement-policy enqueue logic) only
executes after checkPartitionsFlag returns nil.
🧹 Nitpick comments (1)
dumpling/export/config.go (1)

86-86: Minor formatting inconsistency.

This line uses spaces for indentation/alignment while the surrounding constant definitions use tabs. This may cause linter warnings.

-    flagPartitions               = "partitions"
+	flagPartitions               = "partitions"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/config.go` at line 86, The constant declaration for
flagPartitions uses spaces for indentation which is inconsistent with the other
constants; update the declaration of flagPartitions so its indentation matches
the surrounding constants (use a tab for alignment like the other entries) to
satisfy formatting/linter rules and keep the constants block consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@dumpling/export/dump.go`:
- Around line 504-509: The current top-level check rejects all --partitions
usage for TiDB < v5.0.0; instead, modify the logic so that
conf.ServerInfo.ServerType still enforces TiDB for any partitions usage but only
require conf.ServerInfo.ServerVersion >= tableSampleVersion when code will use
TABLESAMPLE-based splitting; locate the version compare that references
conf.ServerInfo.ServerVersion and tableSampleVersion and move or duplicate it
into the path in sequentialDumpTable (the TABLESAMPLE branch around the
partition-scoped TABLESAMPLE logic) so the simple PARTITION(...) export path
remains allowed on older TiDB versions.
- Around line 426-428: The checkPartitionsFlag validation (call:
d.checkPartitionsFlag(tctx, metaConn, allTables)) must run before any
placement-policy tasks are enqueued; move this call above the placement-policy
task creation/enqueue block so a bad --partitions config fails fast and prevents
partial metadata tasks from being queued. Ensure you keep the same parameters
(tctx, metaConn, allTables) and update control flow so that placement-policy
task creation (the placement-policy enqueue logic) only executes after
checkPartitionsFlag returns nil.

---

Nitpick comments:
In `@dumpling/export/config.go`:
- Line 86: The constant declaration for flagPartitions uses spaces for
indentation which is inconsistent with the other constants; update the
declaration of flagPartitions so its indentation matches the surrounding
constants (use a tab for alignment like the other entries) to satisfy
formatting/linter rules and keep the constants block consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a104b743-fb1e-48f2-9c59-2493fe70568c

📥 Commits

Reviewing files that changed from the base of the PR and between 595aeda and 098dfe2.

📒 Files selected for processing (3)
  • dumpling/export/config.go
  • dumpling/export/dump.go
  • dumpling/export/sql_test.go
✅ Files skipped from review due to trivial changes (1)
  • dumpling/export/sql_test.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dumpling/export/config.go (1)

811-813: Consider making the new conflict error more actionable for users.

Line 812 is correct functionally, but adding a short remediation hint (similar to the --sql + --where message) would improve CLI UX consistency.

Suggested tweak
-		return errors.New("can't specify both --sql and --partitions at the same time")
+		return errors.New("can't specify both --sql and --partitions at the same time. Please remove one of them")

As per coding guidelines: "Keep error handling actionable and contextual; avoid silently swallowing errors."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/config.go` around lines 811 - 813, The conflict error when
both conf.SQL and conf.Partitions are set should be changed to include a short
remediation hint for the user; update the error returned from the check that
currently reads "can't specify both --sql and --partitions at the same time"
(the block referencing conf.SQL and conf.Partitions) to a message that suggests
the correct action (e.g., choose one flag or explain how to express filtering
via --sql/WHERE vs using --partitions) so the CLI gives actionable guidance
consistent with the --sql + --where message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@dumpling/export/config.go`:
- Around line 811-813: The conflict error when both conf.SQL and conf.Partitions
are set should be changed to include a short remediation hint for the user;
update the error returned from the check that currently reads "can't specify
both --sql and --partitions at the same time" (the block referencing conf.SQL
and conf.Partitions) to a message that suggests the correct action (e.g., choose
one flag or explain how to express filtering via --sql/WHERE vs using
--partitions) so the CLI gives actionable guidance consistent with the --sql +
--where message.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 274fb33e-6beb-4302-9433-d3c5182c2774

📥 Commits

Reviewing files that changed from the base of the PR and between 098dfe2 and d357deb.

📒 Files selected for processing (1)
  • dumpling/export/config.go

@shiyuhang0 shiyuhang0 changed the title dumpling: support dumpling partition-scoped export splitting dumpling: support partition flag and generate dump sql based on partition condition Apr 8, 2026
@ingress-bot
Copy link
Copy Markdown

🔍 Starting code review for this PR...

Copy link
Copy Markdown

@ingress-bot ingress-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.

Summary

  • Total findings: 10
  • Inline comments: 10
  • Summary-only findings (no inline anchor): 0
Findings (highest risk first)

⚠️ [Major] (6)

  1. --partitions help text does not describe the actual contract enforced by the new code path (dumpling/export/config.go:384, dumpling/export/dump.go:523, dumpling/export/config.go:811)
  2. Duplicate --partitions entries schedule duplicate partition dumps (dumpling/export/config.go:548, dumpling/export/dump.go:772, dumpling/export/dump.go:976)
  3. Global partitions option is modeled as process-wide state but validated as an all-tables invariant (dumpling/export/config.go:190, dumpling/export/dump.go:523)
  4. Partitioned TABLESAMPLE dump path has no determinism regression test (dumpling/export/dump.go:791, dumpling/export/dump.go:964, dumpling/export/sql_test.go:525)
  5. Partition TABLESAMPLE path aborts dump instead of degrading on transient failures (dumpling/export/dump.go:792, dumpling/export/dump.go:964)
  6. Partition sampling repeats primary-key metadata lookup for every partition (dumpling/export/dump.go:979, dumpling/export/dump.go:1063, dumpling/export/sql.go:639)

🟡 [Minor] (4)

  1. --partitions help text omits upgrade/compatibility constraints (dumpling/export/config.go:384, dumpling/export/dump.go:505, dumpling/export/dump.go:508)
  2. Partition-related errors and logs drift from the canonical option vocabulary (dumpling/export/dump.go:505, dumpling/export/dump.go:805)
  3. Partition validation adds one metadata query per base table before dumping starts (dumpling/export/dump.go:426, dumpling/export/dump.go:515, dumpling/export/sql.go:1465)
  4. Partition chunk scheduling logic is duplicated across TABLESAMPLE and TABLE REGIONS paths (dumpling/export/dump.go:964, dumpling/export/dump.go:997)

Comment thread dumpling/export/config.go Outdated
Comment thread dumpling/export/config.go
Comment thread dumpling/export/dump.go
Comment thread dumpling/export/dump.go
Comment thread dumpling/export/dump.go
Comment thread dumpling/export/dump.go Outdated
Comment thread dumpling/export/config.go Outdated
Comment thread dumpling/export/dump.go Outdated
}
conf := d.conf
if conf.ServerInfo.ServerType != version.ServerTypeTiDB {
return errors.New("partition: dump with partitions flag is only available for TiDB")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Partition-related errors and logs drift from the canonical option vocabulary

Impact
The new user-facing messages use mixed forms like partition: and partitions flag instead of the exact option token, which makes troubleshooting less direct. Operators matching runtime errors to CLI help must infer that these messages refer to --partitions.

Scope

  • dumpling/export/dump.go:505(*Dumper).checkPartitionsFlag
  • dumpling/export/dump.go:805(*Dumper).concurrentDumpTable

Evidence
New messages include partition: dump with partitions flag is only available for TiDB, partition: dump with partitions flag requires TiDB version >= v5.0.0, and partition: dump with partitions flag does not support using rows. None of these strings names the option as --partitions, so the contract vocabulary in runtime errors does not match the CLI surface.

Change request
Standardize these error/log strings to use the canonical option name --partitions and keep singular/plural wording consistent. Include a concise reason phrase where needed so fallback-path failures are self-explanatory at the call site.

Comment thread dumpling/export/dump.go
Comment thread dumpling/export/dump.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dumpling/export/config.go (1)

851-866: Consider whether lowercasing partition names is always safe.

normalizePartitions unconditionally lowercases each entry. In TiDB, partition names are identifiers that are typically compared case-insensitively, so this is usually fine; however, the normalized values are later used both for validation against table metadata and for generating PARTITION(...) clauses on the dump side. If downstream comparison/emission ever switches to case-sensitive matching (e.g. identifier quoting, lower_case_table_names=0-style semantics, or a future refactor that preserves casing from metadata), this normalization could mismatch. A brief comment on the function documenting the case-insensitive assumption, or deduping via a lowercased key while preserving the user's original casing in result, would make the invariant explicit.

Otherwise the helper is clean: trims, drops empties, dedups, and preserves input order.

♻️ Optional: preserve original casing while deduping case-insensitively
 func normalizePartitions(partitions []string) []string {
+	// Partition names are treated case-insensitively for dedup/validation;
+	// keep the first-seen original casing so error messages reflect user input.
 	seen := make(map[string]struct{}, len(partitions))
 	result := make([]string, 0, len(partitions))
 	for _, p := range partitions {
-		p = strings.ToLower(strings.TrimSpace(p))
-		if p == "" {
+		trimmed := strings.TrimSpace(p)
+		if trimmed == "" {
 			continue
 		}
-		if _, ok := seen[p]; ok {
+		key := strings.ToLower(trimmed)
+		if _, ok := seen[key]; ok {
 			continue
 		}
-		seen[p] = struct{}{}
-		result = append(result, p)
+		seen[key] = struct{}{}
+		result = append(result, trimmed)
 	}
 	return result
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dumpling/export/config.go` around lines 851 - 866, normalizePartitions
currently lowercases each partition name unconditionally which can break
correctness if downstream code ever treats identifiers case-sensitively; modify
normalizePartitions to dedupe using a lowercased key but preserve the user's
original (trimmed) casing in the result: compute key :=
strings.ToLower(strings.TrimSpace(p)) for deduping in the seen map while
appending the original trimmed p (not the lowercased one) to result, skip
empties as before, and update the function comment to document the
case-insensitive dedupe invariant (or, alternatively, explicitly state why
unconditional lowercasing is acceptable).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@dumpling/export/config.go`:
- Around line 851-866: normalizePartitions currently lowercases each partition
name unconditionally which can break correctness if downstream code ever treats
identifiers case-sensitively; modify normalizePartitions to dedupe using a
lowercased key but preserve the user's original (trimmed) casing in the result:
compute key := strings.ToLower(strings.TrimSpace(p)) for deduping in the seen
map while appending the original trimmed p (not the lowercased one) to result,
skip empties as before, and update the function comment to document the
case-insensitive dedupe invariant (or, alternatively, explicitly state why
unconditional lowercasing is acceptable).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: edb30813-4afc-47b9-9412-4074255f6482

📥 Commits

Reviewing files that changed from the base of the PR and between 2fd088c and a46f147.

📒 Files selected for processing (2)
  • dumpling/export/config.go
  • dumpling/export/dump.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • dumpling/export/dump.go

@ti-chi-bot ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 20, 2026
@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 21, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 21, 2026

@shiyuhang0: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow a40fd84 link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, GMHDBJD

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 21, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 21, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-20 11:01:03.980825776 +0000 UTC m=+1990869.186185833: ☑️ agreed by D3Hunter.
  • 2026-04-21 10:25:19.783136143 +0000 UTC m=+2075124.988496190: ☑️ agreed by GMHDBJD.

@ti-chi-bot ti-chi-bot Bot merged commit e0c3674 into pingcap:master Apr 21, 2026
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved component/dumpling This is related to Dumpling of TiDB. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants