*: shrink types.Datum from 72 to 56 bytes#67977
*: shrink types.Datum from 72 to 56 bytes#67977mjonss wants to merge 5 commits intopingcap:masterfrom
Conversation
Turn on b.ReportAllocs() across Datum-related benchmarks so benchstat can diff B/op and allocs/op against upcoming struct-size changes. Add three new targeted benches in pkg/types: - BenchmarkDatumCopy: tightest signal on Datum struct size (heterogeneous slice copied per iteration). - BenchmarkDatumSetMysqlTime: locks in the current 1 alloc / 8 B per call caused by boxing Time through d.x; will drop to 0 allocs once Time packs into d.i. - BenchmarkCompareDatumCollation: exercises the named-collator compare path, which the existing BenchmarkCompareDatum skips by using the binary collator. Behavior-only change in benchmarks; no production code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 16-byte `collation string` field on types.Datum with a 2-byte
`collationID uint16` carrying the MySQL collation id, and shrink
`decimal uint16` (frac; MySQL DECIMAL frac <= 30, fsp <= 6) to `uint8`.
The freed padding absorbs the new collationID so the first struct word
stays 8 bytes. Net: Datum is now 56 bytes instead of 72 (-22%), every
`[]Datum` allocation is correspondingly smaller, and Datum.Equals now
compares collations with a uint16 compare instead of a string compare.
API compat:
- Collation() string and SetCollation(string) are preserved as
facades that resolve via the charset package. External callers keep
working unchanged.
- New CollationID() uint16 / SetCollationID(uint16) for hot paths.
- MarshalJSON/UnmarshalJSON keep the on-wire "collation" field as a
name string, so older JSON blobs round-trip.
- collate.GetCollatorByID(0) now falls through silently (matching
GetCollator("")), since id 0 is the uint16 counterpart of the
legacy empty-string "no collation" sentinel.
Hot-path callers migrated from GetCollator(d.Collation()) to
GetCollatorByID(int(d.CollationID())) in:
codec.encodeString, ranger/detacher, planner/cardinality,
table/tables/mutation_checker, executor/foreign_key,
plus the internal Datum compare/key paths.
Datum.MemUsage drops the len(collation) term — the collation is now
inline (counted by EmptyDatumSize); TestRangeMemUsage is updated to
match. GetDatumMemSize in aggfuncs/func_group_concat follows suit.
Measured on linux/amd64, Intel i7-13700:
- unsafe.Sizeof(types.Datum) = 56 B (was 72 B)
- BenchmarkDecodeWithSize (codec): 8.0 KiB -> 6.0 KiB B/op (-25%),
2435 -> 1862 ns/op (-24%)
- BenchmarkDecodeWithOutSize (codec): 20.88 KiB -> 15.83 KiB (-24%)
- BenchmarkCompareDatum (types): 34.06 -> 32.12 ns/op (-5.7%)
- BenchmarkEncodeValue (tablecodec): 135 -> 122 ns/op (-10%)
- No allocation regressions across the bench set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughDatum in-memory layout was compacted (collation name → collationID; decimal narrowed), callers switched to collator-by-ID, memory accounting and benchmarks updated, and numerous tests adjusted to the new Datum representation and mem-usage expectations. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @mjonss. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/types/datum_test.go`:
- Around line 619-634: The benchmark comment is inaccurate:
BenchmarkCompareDatumCollation currently passes a collator into Datum.Compare
(the local variable coll), so Compare → compareString uses the provided comparer
and never calls d1.Collation() or reads d1.collationID; update the benchmark or
its comment. Either reword the docstring on BenchmarkCompareDatumCollation to
say it measures the cost of using an external collator via the comparer argument
(coll) or change the benchmark to exercise the Datum.Collation() path (e.g.,
call Compare without supplying the coll argument so compareString obtains the
collator from d1.Collation()/d1.collationID). Make the change around the
BenchmarkCompareDatumCollation function and mention Datum.Compare,
compareString, Datum.Collation()/collationID, and the local coll variable in
your fix.
In `@pkg/types/datum.go`:
- Around line 167-170: The SetFrac method silently narrows int to uint8; add a
guard using intest.Assert to ensure the provided frac is within the allowable
range before casting and assigning in Datum.SetFrac. Specifically, in
Datum.SetFrac(frac int) add an assertion such as intest.Assert(frac >= 0 && frac
<= 255) (or the tighter documented invariant if you prefer, e.g., frac <= 30 for
DECIMAL) and then perform d.decimal = uint8(frac); this will catch
UnspecifiedLength (-1) or out-of-range values passed from callers like
convertToMysqlDecimal/target.GetDecimal() or ProduceDecWithSpecifiedTp instead
of silently truncating.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4671e5c7-975b-4ed2-826b-d8ddf3a2ce50
📒 Files selected for processing (14)
pkg/executor/aggfuncs/func_group_concat.gopkg/executor/foreign_key.gopkg/planner/cardinality/row_count_column.gopkg/table/tables/mutation_checker.gopkg/tablecodec/bench_test.gopkg/tablecodec/tablecodec_test.gopkg/types/datum.gopkg/types/datum_test.gopkg/types/field_type.gopkg/util/codec/bench_test.gopkg/util/codec/codec.gopkg/util/collate/collate.gopkg/util/ranger/detacher.gopkg/util/ranger/types_test.go
| // BenchmarkCompareDatumCollation compares collated string datums. Unlike | ||
| // BenchmarkCompareDatum, the datums carry a non-empty collation and the | ||
| // compare path dispatches through the named collator instead of the binary | ||
| // shortcut, so it exercises Datum.Collation() on the hot path. | ||
| func BenchmarkCompareDatumCollation(b *testing.B) { | ||
| d1 := NewCollationStringDatum("abcdefghij", charset.CollationUTF8MB4) | ||
| d2 := NewCollationStringDatum("abcdefghik", charset.CollationUTF8MB4) | ||
| coll := collate.GetCollator(charset.CollationUTF8MB4) | ||
| b.ReportAllocs() | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| if _, err := d1.Compare(DefaultStmtNoWarningContext, &d2, coll); err != nil { | ||
| b.Fatal(err) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Benchmark comment is misleading — the compare path doesn't dispatch through Datum.Collation().
Datum.Compare → compareString uses the comparer argument you pass in (coll here); it never reads d1.Collation() / d1.collationID. So the bench measures collated comparator cost, not any Collation() accessor on the hot path. Consider rewording the doc, or change what the bench is intended to measure.
📝 Suggested comment tweak
-// BenchmarkCompareDatumCollation compares collated string datums. Unlike
-// BenchmarkCompareDatum, the datums carry a non-empty collation and the
-// compare path dispatches through the named collator instead of the binary
-// shortcut, so it exercises Datum.Collation() on the hot path.
+// BenchmarkCompareDatumCollation compares collated string datums through
+// a named (non-binary) collator. Unlike BenchmarkCompareDatum, this avoids
+// the binary-collator shortcut so the cost of the collated comparison
+// dominates the measurement.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // BenchmarkCompareDatumCollation compares collated string datums. Unlike | |
| // BenchmarkCompareDatum, the datums carry a non-empty collation and the | |
| // compare path dispatches through the named collator instead of the binary | |
| // shortcut, so it exercises Datum.Collation() on the hot path. | |
| func BenchmarkCompareDatumCollation(b *testing.B) { | |
| d1 := NewCollationStringDatum("abcdefghij", charset.CollationUTF8MB4) | |
| d2 := NewCollationStringDatum("abcdefghik", charset.CollationUTF8MB4) | |
| coll := collate.GetCollator(charset.CollationUTF8MB4) | |
| b.ReportAllocs() | |
| b.ResetTimer() | |
| for i := 0; i < b.N; i++ { | |
| if _, err := d1.Compare(DefaultStmtNoWarningContext, &d2, coll); err != nil { | |
| b.Fatal(err) | |
| } | |
| } | |
| } | |
| // BenchmarkCompareDatumCollation compares collated string datums through | |
| // a named (non-binary) collator. Unlike BenchmarkCompareDatum, this avoids | |
| // the binary-collator shortcut so the cost of the collated comparison | |
| // dominates the measurement. | |
| func BenchmarkCompareDatumCollation(b *testing.B) { | |
| d1 := NewCollationStringDatum("abcdefghij", charset.CollationUTF8MB4) | |
| d2 := NewCollationStringDatum("abcdefghik", charset.CollationUTF8MB4) | |
| coll := collate.GetCollator(charset.CollationUTF8MB4) | |
| b.ReportAllocs() | |
| b.ResetTimer() | |
| for i := 0; i < b.N; i++ { | |
| if _, err := d1.Compare(DefaultStmtNoWarningContext, &d2, coll); err != nil { | |
| b.Fatal(err) | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/types/datum_test.go` around lines 619 - 634, The benchmark comment is
inaccurate: BenchmarkCompareDatumCollation currently passes a collator into
Datum.Compare (the local variable coll), so Compare → compareString uses the
provided comparer and never calls d1.Collation() or reads d1.collationID; update
the benchmark or its comment. Either reword the docstring on
BenchmarkCompareDatumCollation to say it measures the cost of using an external
collator via the comparer argument (coll) or change the benchmark to exercise
the Datum.Collation() path (e.g., call Compare without supplying the coll
argument so compareString obtains the collator from
d1.Collation()/d1.collationID). Make the change around the
BenchmarkCompareDatumCollation function and mention Datum.Compare,
compareString, Datum.Collation()/collationID, and the local coll variable in
your fix.
| // SetFrac sets the frac of the datum. | ||
| func (d *Datum) SetFrac(frac int) { | ||
| d.decimal = uint16(frac) | ||
| d.decimal = uint8(frac) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all SetFrac call sites and show surrounding context to audit negative/UnspecifiedLength inputs.
rg -nP --type=go -C2 '\.SetFrac\s*\('Repository: pingcap/tidb
Length of output: 3893
🏁 Script executed:
#!/bin/bash
# 1. Show the context around line 1615 in convertToMysqlDecimal
echo "=== convertToMysqlDecimal context (lines 1600–1630) ==="
sed -n '1600,1630p' pkg/types/datum.go
# 2. Check what GetDecimal() returns and UnspecifiedLength value
echo -e "\n=== UnspecifiedLength constant definition ==="
rg -n 'const\s+UnspecifiedLength' pkg/types/
# 3. Check GetDecimal() method signature
echo -e "\n=== GetDecimal() method definition ==="
rg -n 'func.*GetDecimal\(' pkg/types/field_type.go | head -5
# 4. Check actual values returned by GetDecimal in realistic scenarios
echo -e "\n=== Sample GetDecimal() usages to understand typical values ==="
rg -B2 -A1 '\.GetDecimal\(' pkg/types/datum.go | head -30Repository: pingcap/tidb
Length of output: 1890
🏁 Script executed:
#!/bin/bash
# 5. Check if Frac() is actually used after SetFrac in line 1615 caller chain
echo "=== Who calls convertToMysqlDecimal? ==="
rg -n 'convertToMysqlDecimal' pkg/types/ | head -10
# 6. Show function signature of convertToMysqlDecimal to understand target parameter
echo -e "\n=== convertToMysqlDecimal function signature ==="
sed -n '1600,1620p' pkg/types/datum.go | grep -A5 'func.*convertToMysqlDecimal'Repository: pingcap/tidb
Length of output: 535
🏁 Script executed:
#!/bin/bash
# 7. Verify: does row.go line 184 actually pass UnspecifiedLength?
echo "=== chunk/row.go lines 175–195 (context around SetFrac calls) ==="
sed -n '175,195p' pkg/util/chunk/row.goRepository: pingcap/tidb
Length of output: 879
🏁 Script executed:
#!/bin/bash
# 8. Check if intest.Assert is already used elsewhere in datum.go for type contracts
echo "=== intest.Assert patterns in datum.go ==="
rg -n 'intest.Assert' pkg/types/datum.goRepository: pingcap/tidb
Length of output: 131
🏁 Script executed:
#!/bin/bash
# Check who calls ConvertDatum (line 1091 references convertToMysqlDecimal)
echo "=== ConvertDatum function and its callers ==="
sed -n '1050,1100p' pkg/types/datum.go | head -50
# Find all callers of ConvertDatum
echo -e "\n=== ConvertDatum callers ==="
rg -n 'ConvertDatum\(' pkg/ --type=go | head -15Repository: pingcap/tidb
Length of output: 2021
🏁 Script executed:
#!/bin/bash
# Check if there are any tests that verify SetFrac with UnspecifiedLength or -1
echo "=== Tests calling SetFrac or testing frac narrowing ==="
rg -n 'SetFrac.*-1|UnspecifiedLength.*SetFrac' pkg/types/ --type=go -B2 -A2
# Also check: are there any test cases for convertToMysqlDecimal with UnspecifiedLength?
echo -e "\n=== convertToMysqlDecimal tests ==="
rg -n 'convertToMysqlDecimal' pkg/types/ --type=go -B1 -A3Repository: pingcap/tidb
Length of output: 722
🏁 Script executed:
#!/bin/bash
# Check the actual GetDecimal() implementation to confirm it can return -1
echo "=== FieldType.GetDecimal() implementation ==="
rg -A5 'func.*\*FieldType\).*GetDecimal' pkg/types/field_type.goRepository: pingcap/tidb
Length of output: 101
🏁 Script executed:
#!/bin/bash
# Search for GetDecimal method more broadly
echo "=== Searching for GetDecimal method ==="
rg -n 'func.*GetDecimal' pkg/types/ --type=go
# Also check the actual field definition
echo -e "\n=== FieldType struct definition ==="
sed -n '1,100p' pkg/types/field_type.go | grep -A20 'type FieldType struct'Repository: pingcap/tidb
Length of output: 213
🏁 Script executed:
#!/bin/bash
# Now check: can convertToMysqlDecimal receive a FieldType with Decimal == -1?
# Look at callers of ConvertTo to understand what FieldTypes are passed
echo "=== ConvertTo callers in tests/real code ==="
rg -n '\.ConvertTo\(' pkg/ --type=go -B1 -A1 | head -30Repository: pingcap/tidb
Length of output: 1614
🏁 Script executed:
#!/bin/bash
# Key: check if any cast/decimal type with GetDecimal() == -1 is passed to ConvertTo
# Look at tests that call ConvertTo with TypeNewDecimal
echo "=== Tests for ConvertTo with TypeNewDecimal ==="
rg -n 'TypeNewDecimal' pkg/types/datum_test.go -B2 -A2 | head -40Repository: pingcap/tidb
Length of output: 1382
🏁 Script executed:
#!/bin/bash
# Check FieldTypeBuilder to see what the default decimal value is
echo "=== FieldTypeBuilder implementation ==="
sed -n '1,100p' pkg/types/field_type_builder.go
# Also check NewFieldType helper
echo -e "\n=== NewFieldType function ==="
rg -A10 '^func NewFieldType' pkg/types/field_type.goRepository: pingcap/tidb
Length of output: 3556
🏁 Script executed:
#!/bin/bash
# Most importantly: check if any caller of ConvertTo passes a FieldType
# where GetDecimal() was not explicitly set (would default to -1)
echo "=== Checking ConvertTo calls with default FieldType ==="
rg -B5 'ConvertTo.*NewFieldType' pkg/ --type=go | head -40Repository: pingcap/tidb
Length of output: 2759
🏁 Script executed:
#!/bin/bash
# Also check the actual behavior: does convertToMysqlDecimal actually fail or has issues
# when SetFrac gets -1?
echo "=== What happens after SetFrac in convertToMysqlDecimal ==="
sed -n '1612,1680p' pkg/types/datum.goRepository: pingcap/tidb
Length of output: 2147
🏁 Script executed:
#!/bin/bash
# Check minFlenAndDecimalForType to see default decimals for TypeNewDecimal
echo "=== minFlenAndDecimalForType function ==="
rg -A50 'func minFlenAndDecimalForType' pkg/types/field_type.go | head -70Repository: pingcap/tidb
Length of output: 2150
🏁 Script executed:
#!/bin/bash
# Check if ProduceDecWithSpecifiedTp actually handles the case where decimal is -1
echo "=== ProduceDecWithSpecifiedTp full implementation ==="
sed -n '1653,1730p' pkg/types/datum.goRepository: pingcap/tidb
Length of output: 2608
🏁 Script executed:
#!/bin/bash
# Most importantly: verify if Frac() is actually read and used after SetFrac
# Check what Frac() does
echo "=== Datum.Frac() method ==="
rg -A3 'func \(d \*Datum\) Frac\(' pkg/types/datum.goRepository: pingcap/tidb
Length of output: 138
Silent narrowing in SetFrac — add a guard on the documented invariant.
uint8(frac) silently truncates values outside [0, 255]. The struct doc (line 78) documents the contract that MySQL fsp ≤ 6 and DECIMAL frac ≤ 30, but convertToMysqlDecimal (line 1615) passes target.GetDecimal() unconditionally, which can be UnspecifiedLength (-1) for newly constructed FieldTypes:
- Old behavior:
uint16(-1) = 65535, soFrac()returned 65535 - New behavior:
uint8(-1) = 255, soFrac()now returns 255
For KindMysqlDecimal the authoritative frac lives in *MyDecimal, and ProduceDecWithSpecifiedTp skips rounding when frac is unspecified, so the practical impact is small. However, like the pattern shown in nearby chunk/row.go (lines 181–184), the narrowing should be guarded to prevent silent value changes. Add an intest.Assert to catch any future violations:
Suggested guard
// SetFrac sets the frac of the datum.
func (d *Datum) SetFrac(frac int) {
+ intest.Assert(frac >= 0 && frac <= math.MaxUint8, "Datum.SetFrac: frac out of range for uint8")
d.decimal = uint8(frac)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // SetFrac sets the frac of the datum. | |
| func (d *Datum) SetFrac(frac int) { | |
| d.decimal = uint16(frac) | |
| d.decimal = uint8(frac) | |
| } | |
| // SetFrac sets the frac of the datum. | |
| func (d *Datum) SetFrac(frac int) { | |
| intest.Assert(frac >= 0 && frac <= math.MaxUint8, "Datum.SetFrac: frac out of range for uint8") | |
| d.decimal = uint8(frac) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/types/datum.go` around lines 167 - 170, The SetFrac method silently
narrows int to uint8; add a guard using intest.Assert to ensure the provided
frac is within the allowable range before casting and assigning in
Datum.SetFrac. Specifically, in Datum.SetFrac(frac int) add an assertion such as
intest.Assert(frac >= 0 && frac <= 255) (or the tighter documented invariant if
you prefer, e.g., frac <= 30 for DECIMAL) and then perform d.decimal =
uint8(frac); this will catch UnspecifiedLength (-1) or out-of-range values
passed from callers like convertToMysqlDecimal/target.GetDecimal() or
ProduceDecWithSpecifiedTp instead of silently truncating.
The inl_join range-fallback thresholds (2900/2300/700) were calibrated to Datum's old 72-byte size. After shrinking Datum to 56 bytes, ranges fit comfortably under the old limits and no longer exercise the progressive fallback. Lower the thresholds to 2300/1500/500 so the same three fallback levels (drop trailing col, drop two cols, INL_JOIN inapplicable) are triggered again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shrinking Datum.decimal from uint16 to uint8 in 2a3c81e dropped the int8 sign-extending cast in GetMysqlDuration, so Fsp=-1 (UnspecifiedFsp) came back as 255 and crashed Duration.formatFrac with a slice-bounds panic (TestDateArithFuncs). Restore the int8 round-trip so the -1 sentinel survives the uint storage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After shrinking Datum.collation from a byte string to a uint16 id, Equals compares the canonical MySQL collation id instead of the raw name. That makes "UTF8MB4_BIN" and "utf8mb4_bin" compare equal where they previously compared unequal. It aligns with MySQL's case-insensitive collation names but was not previously test-pinned, so a future refactor could silently flip it back. Add TestDatumEqualsCollation covering same-canonical/different-casing, distinct-canonical, and the unknown-name-collapses-to-0 edge case. Also call out the lossy name -> id storage in SetCollation's doc so future readers understand the trade-off without hunting the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/types/datum.go`:
- Around line 2274-2280: The MarshalJSON in function MarshalJSON is currently
converting d.decimal/d.length to uint8/uint16 directly which loses the legacy
sentinel for UnspecifiedFsp/UnspecifiedLength; update the jsonDatum Decimal and
Length population so that when d.decimal == UnspecifiedFsp or d.length ==
UnspecifiedLength you set the marshaled uint16 value to 65535 (the old -1
sentinel) otherwise cast the actual value, i.e., preserve the legacy sentinel
mapping for the Decimal and Length fields in jsonDatum before returning the
JSON.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6bdb37f6-3adc-49f7-acf4-3d98cbdc99c8
📒 Files selected for processing (3)
pkg/types/datum.gotests/integrationtest/r/planner/core/casetest/index/index.resulttests/integrationtest/t/planner/core/casetest/index/index.test
| func (d *Datum) MarshalJSON() ([]byte, error) { | ||
| jd := &jsonDatum{ | ||
| K: d.k, | ||
| Decimal: d.decimal, | ||
| Decimal: uint16(d.decimal), | ||
| Length: d.length, | ||
| I: d.i, | ||
| Collation: d.collation, | ||
| Collation: d.Collation(), |
There was a problem hiding this comment.
Preserve the legacy JSON sentinel when marshaling narrowed decimal.
For UnspecifiedFsp/UnspecifiedLength, the old uint16 backing encoded -1 as 65535; this now marshals as 255. That breaks the stated JSON wire-compatibility goal for older readers, which would decode 255 as a positive FSP/frac instead of the sentinel.
Compatibility-preserving marshal adjustment
func (d *Datum) MarshalJSON() ([]byte, error) {
+ decimal := uint16(d.decimal)
+ if d.decimal == ^uint8(0) {
+ decimal = ^uint16(0)
+ }
jd := &jsonDatum{
K: d.k,
- Decimal: uint16(d.decimal),
+ Decimal: decimal,
Length: d.length,
I: d.i,
Collation: d.Collation(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/types/datum.go` around lines 2274 - 2280, The MarshalJSON in function
MarshalJSON is currently converting d.decimal/d.length to uint8/uint16 directly
which loses the legacy sentinel for UnspecifiedFsp/UnspecifiedLength; update the
jsonDatum Decimal and Length population so that when d.decimal == UnspecifiedFsp
or d.length == UnspecifiedLength you set the marshaled uint16 value to 65535
(the old -1 sentinel) otherwise cast the actual value, i.e., preserve the legacy
sentinel mapping for the Decimal and Length fields in jsonDatum before returning
the JSON.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67977 +/- ##
================================================
- Coverage 77.7952% 77.1441% -0.6511%
================================================
Files 1984 1966 -18
Lines 549728 549753 +25
================================================
- Hits 427662 424102 -3560
- Misses 121146 125647 +4501
+ Partials 920 4 -916
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/types/datum.go (1)
782-784: Staledatum.collationreference inComparedocstring.The warning on line 783 references
datum.collation, but that field no longer exists — it was replaced bycollationIDin this PR. Meanwhile,ToHashKey(line 2208/2214) andSortDatums(line 2463) now deliberately derive the collator fromd.collationIDviaGetCollatorByID, so the comment is both stale (wrong field name) and somewhat inconsistent with the new pattern established by this PR. Consider updating it to clarify the actual invariant the caller must respect.♻️ Suggested doc tweak
// Compare compares datum to another datum. -// Notes: don't rely on datum.collation to get the collator, it's tend to buggy. +// Notes: callers must pass an explicit `comparer` for string-typed datums; +// do not assume the datum's stored collationID alone selects the right +// collator for the comparison context. func (d *Datum) Compare(ctx Context, ad *Datum, comparer collate.Collator) (int, error) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/types/datum.go` around lines 782 - 784, Update the stale docstring on Datum.Compare: replace the reference to the removed datum.collation with the current collationID-based pattern and clarify the caller contract — callers must not rely on a non-existent per-datum collator field and should instead ensure the correct collator is provided (or derive it via GetCollatorByID using d.collationID, as done in ToHashKey and SortDatums); edit the comment on Compare to mention collatationID and the established invariant that collators must be obtained via GetCollatorByID when needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/types/datum_test.go`:
- Around line 558-585: In TestDatumEqualsCollation, ensure the case-variant
equality check isn't vacuous by asserting the resolved collation ID is non-zero
before comparing; specifically, after creating a and b (from mk("UTF8MB4_BIN")
and mk("utf8mb4_bin")), add an assertion that a.CollationID() (and/or
b.CollationID()) != 0 (use the test helper like require.NotZero or
require.NotEqual with uint16(0)) before calling require.Equal and require.True
on CollationID() and Equals(&b) so this block verifies a known (non-zero)
canonical collation.
---
Nitpick comments:
In `@pkg/types/datum.go`:
- Around line 782-784: Update the stale docstring on Datum.Compare: replace the
reference to the removed datum.collation with the current collationID-based
pattern and clarify the caller contract — callers must not rely on a
non-existent per-datum collator field and should instead ensure the correct
collator is provided (or derive it via GetCollatorByID using d.collationID, as
done in ToHashKey and SortDatums); edit the comment on Compare to mention
collatationID and the established invariant that collators must be obtained via
GetCollatorByID when needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f1189cf6-162c-42a5-ab47-7e2364f32b1a
📒 Files selected for processing (2)
pkg/types/datum.gopkg/types/datum_test.go
| func TestDatumEqualsCollation(t *testing.T) { | ||
| mk := func(name string) Datum { | ||
| var d Datum | ||
| d.SetString("abc", name) | ||
| return d | ||
| } | ||
|
|
||
| // Same canonical collation, different name casing → equal. | ||
| a := mk("UTF8MB4_BIN") | ||
| b := mk("utf8mb4_bin") | ||
| require.Equal(t, a.CollationID(), b.CollationID()) | ||
| require.True(t, a.Equals(&b)) | ||
|
|
||
| // Different canonical collations → not equal. | ||
| c := mk(charset.CollationUTF8MB4) | ||
| d := mk(charset.CollationASCII) | ||
| require.NotEqual(t, c.CollationID(), d.CollationID()) | ||
| require.False(t, c.Equals(&d)) | ||
|
|
||
| // Unknown names both collapse to id 0. The resulting equality is a | ||
| // documented consequence of lossy name → id storage; pin it so a | ||
| // future change doesn't accidentally flip it back to string compare. | ||
| u1 := mk("nonexistent_one") | ||
| u2 := mk("nonexistent_two") | ||
| require.Equal(t, uint16(0), u1.CollationID()) | ||
| require.Equal(t, uint16(0), u2.CollationID()) | ||
| require.True(t, u1.Equals(&u2)) | ||
| } |
There was a problem hiding this comment.
Strengthen TestDatumEqualsCollation so the case-variant claim cannot pass vacuously.
The first block (lines 566-569) asserts a.CollationID() == b.CollationID() and a.Equals(&b), but does not assert the resolved ID is non-zero. If charset.GetCollationByName were ever changed such that "UTF8MB4_BIN"/"utf8mb4_bin" failed to resolve, collationNameToID would return 0 for both and this test would still pass — while silently merging into the unknown-name case lower down. Since the stated purpose is to pin canonical-id equality for known case-variant names, pin a non-zero ID here.
🛡️ Proposed tightening
// Same canonical collation, different name casing → equal.
a := mk("UTF8MB4_BIN")
b := mk("utf8mb4_bin")
+ require.NotZero(t, a.CollationID(), "expected UTF8MB4_BIN to resolve to a known collation id")
require.Equal(t, a.CollationID(), b.CollationID())
require.True(t, a.Equals(&b))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestDatumEqualsCollation(t *testing.T) { | |
| mk := func(name string) Datum { | |
| var d Datum | |
| d.SetString("abc", name) | |
| return d | |
| } | |
| // Same canonical collation, different name casing → equal. | |
| a := mk("UTF8MB4_BIN") | |
| b := mk("utf8mb4_bin") | |
| require.Equal(t, a.CollationID(), b.CollationID()) | |
| require.True(t, a.Equals(&b)) | |
| // Different canonical collations → not equal. | |
| c := mk(charset.CollationUTF8MB4) | |
| d := mk(charset.CollationASCII) | |
| require.NotEqual(t, c.CollationID(), d.CollationID()) | |
| require.False(t, c.Equals(&d)) | |
| // Unknown names both collapse to id 0. The resulting equality is a | |
| // documented consequence of lossy name → id storage; pin it so a | |
| // future change doesn't accidentally flip it back to string compare. | |
| u1 := mk("nonexistent_one") | |
| u2 := mk("nonexistent_two") | |
| require.Equal(t, uint16(0), u1.CollationID()) | |
| require.Equal(t, uint16(0), u2.CollationID()) | |
| require.True(t, u1.Equals(&u2)) | |
| } | |
| func TestDatumEqualsCollation(t *testing.T) { | |
| mk := func(name string) Datum { | |
| var d Datum | |
| d.SetString("abc", name) | |
| return d | |
| } | |
| // Same canonical collation, different name casing → equal. | |
| a := mk("UTF8MB4_BIN") | |
| b := mk("utf8mb4_bin") | |
| require.NotZero(t, a.CollationID(), "expected UTF8MB4_BIN to resolve to a known collation id") | |
| require.Equal(t, a.CollationID(), b.CollationID()) | |
| require.True(t, a.Equals(&b)) | |
| // Different canonical collations → not equal. | |
| c := mk(charset.CollationUTF8MB4) | |
| d := mk(charset.CollationASCII) | |
| require.NotEqual(t, c.CollationID(), d.CollationID()) | |
| require.False(t, c.Equals(&d)) | |
| // Unknown names both collapse to id 0. The resulting equality is a | |
| // documented consequence of lossy name → id storage; pin it so a | |
| // future change doesn't accidentally flip it back to string compare. | |
| u1 := mk("nonexistent_one") | |
| u2 := mk("nonexistent_two") | |
| require.Equal(t, uint16(0), u1.CollationID()) | |
| require.Equal(t, uint16(0), u2.CollationID()) | |
| require.True(t, u1.Equals(&u2)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/types/datum_test.go` around lines 558 - 585, In TestDatumEqualsCollation,
ensure the case-variant equality check isn't vacuous by asserting the resolved
collation ID is non-zero before comparing; specifically, after creating a and b
(from mk("UTF8MB4_BIN") and mk("utf8mb4_bin")), add an assertion that
a.CollationID() (and/or b.CollationID()) != 0 (use the test helper like
require.NotZero or require.NotEqual with uint16(0)) before calling require.Equal
and require.True on CollationID() and Equals(&b) so this block verifies a known
(non-zero) canonical collation.
|
/retest |
|
@mjonss: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@mjonss: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #67976
Problem Summary:
types.Datumis a 72-byte value type that appears on every row- and value-levelpath in TiDB: row decoding, expression evaluation, planner constant folding,
index lookups, join keys, and stats sampling. A significant share of those bytes
is waste:
collation string(16 B) stores a collation name per Datum, but the set ofcollations is a small fixed set of MySQL ids.
decimal uint16(2 B) carriesfrac, whose maximum value is 30.Shrinking the struct reduces the per-row working set (especially noticeable for
[]Datumof N columns, where savings scale linearly) and also turns thecollation comparison in
Datum.Equalsinto an integer compare.What changed and how does it work?
1. Benchmark baseline (first commit)
b.ReportAllocs()on existing Datum-adjacent benches inpkg/types,pkg/tablecodec,pkg/util/codecsobenchstatcan diffB/opandallocs/op.pkg/types:BenchmarkDatumCopy— tightCopyloop over a heterogeneous[]Datum;most direct signal on struct size.
BenchmarkDatumSetMysqlTime— locks in the current 1 alloc / 8 B per callcaused by boxing
Timethroughd.x any.BenchmarkCompareDatumCollation— exercises the named-collator comparepath that
BenchmarkCompareDatumbypasses via the binary collator.2. Shrink
types.Datumfrom 72 to 56 bytes (second commit)collation stringfield with a 2-bytecollationID uint16carrying the MySQL collation id (0 = unset, matching thelegacy empty-string sentinel).
decimalfromuint16touint8.collationIDso the first 8-byte word isfully packed with
k + decimal + collationID + length.Layout transition:
API compatibility:
Collation() stringandSetCollation(string)are preserved as facades thatresolve via the
charsetpackage. External callers keep working unchanged.CollationID() uint16/SetCollationID(uint16)for hot paths.MarshalJSON/UnmarshalJSONkeep the on-wire"collation"field as a namestring, so older JSON blobs round-trip.
collate.GetCollatorByID(0)now falls through silently to matchGetCollator("").Hot-path callers were migrated from
collate.GetCollator(d.Collation())tocollate.GetCollatorByID(int(d.CollationID()))so they do a single map lookupinstead of a name lookup followed by a collator lookup:
util/codec/codec.go,util/ranger/detacher.go,planner/cardinality/row_count_column.go,table/tables/mutation_checker.go(2 sites),
executor/foreign_key.go, plus all internal Datum compare/key pathsin
pkg/types/datum.go.Datum.MemUsagedrops thelen(collation)term; the collation is now inlineand fully accounted for in
EmptyDatumSize.TestRangeMemUsageandGetDatumMemSizeinaggfuncs/func_group_concatfollow suit.Measurements
benchstat, linux/amd64, Intel i7-13700,-count=5:unsafe.Sizeof(types.Datum)BenchmarkDecodeWithSize(codec) B/opBenchmarkDecodeWithSize(codec) ns/opBenchmarkDecodeWithOutSize(codec) B/opBenchmarkCompareDatum(types) ns/opBenchmarkEncodeValue(tablecodec) ns/opNo allocation regressions observed across the full bench set.
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
Performance & Optimization
Tests