aggexec: reduce allocations in serialization paths and remove dead code#23873
aggexec: reduce allocations in serialization paths and remove dead code#23873mergify[bot] merged 10 commits intomatrixorigin:mainfrom
Conversation
- encoding.go: replace heap-allocated slices in Read*/Write* helpers with stack-allocated fixed-size arrays; use encoding/binary directly - aggState: use NewOffHeapVecWithType for state vectors to avoid GC pressure - aggState.readState: replace ReadSizeBytes+UnmarshalBinary with LimitReader+ UnmarshalFromReader, eliminating per-mob intermediate []byte allocation - Add MarshalerUnmarshaler.UnmarshalFromReader; implement on bmp via roaring.ReadFrom - Add Vectors[T].UnmarshalFromReader using LimitReader+vector.UnmarshalWithReader, eliminating per-vector intermediate []byte allocation; use in median.go - Remove marshal()/unmarshal() from AggFuncExec interface and all implementations (dead code, production path is SaveIntermediateResult/UnmarshalFromReader) - Remove helpers only used by deleted methods: getEncoded(), getGroupContextEncodings(), aggExec panic stubs
There was a problem hiding this comment.
Pull request overview
This pull request is a refactoring focused on performance optimization and code consolidation. It removes deprecated interface methods from the aggregation executor framework, replaces in-heap vector/batch creation with off-heap alternatives for better memory efficiency, optimizes encoding operations with stack-allocated buffers, and refactors spill operations to reuse buffers and improve memory locality. Additionally, it renames the AllGroupHash() method to AppendAllGroupHash() to use a caller-provided buffer for better efficiency.
Changes:
- Replace
AllGroupHash()withAppendAllGroupHash()accepting pre-allocated buffer parameter for performance - Remove deprecated
marshal()andunmarshal()methods fromAggFuncExecinterface and all implementations - Update vector/batch creation calls from
NewVec()/NewWithSize()toNewOffHeapVecWithType()/NewOffHeapWithSize() - Optimize encoding/binary I/O operations with fixed-size stack-allocated buffers
- Refactor spill operations to reuse buffers and improve memory efficiency through the new spill buffer fields
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/vm/message/joinMapMsg.go | Removed deprecated AllGroupHash() method |
| pkg/container/hashtable/int64_hash_map.go | Updated to AppendAllGroupHash() with buffer parameter |
| pkg/container/hashtable/string_hash_map.go | Updated to AppendAllGroupHash() with buffer parameter |
| pkg/common/hashmap/types.go | Updated interface definition for AppendAllGroupHash() |
| pkg/common/hashmap/inthashmap.go | Updated wrapper to use new AppendAllGroupHash() |
| pkg/common/hashmap/strhashmap.go | Updated wrapper to use new AppendAllGroupHash() |
| pkg/sql/colexec/group/types2.go | Added reusable spill buffer fields and cleanup logic |
| pkg/sql/colexec/group/helper.go | Refactored spillDataToDisk() to reuse buffers and use AppendAllGroupHash() |
| pkg/sql/colexec/multi_update/s3writer_delegate.go | Updated vector/batch creation to off-heap variants |
| pkg/sql/colexec/evalProjection.go | Updated batch creation to off-heap |
| pkg/sql/colexec/evalExpression.go | Updated vector/batch creation to off-heap and added SetOffHeap() call |
| pkg/sql/colexec/dedupjoin/join.go | Updated vector/batch creation to off-heap |
| pkg/sql/colexec/aggexec/window.go | Removed deprecated marshal()/unmarshal() methods; updated vector creation |
| pkg/sql/colexec/aggexec/types.go | Removed marshal() and unmarshal() from AggFuncExec interface |
| pkg/sql/colexec/aggexec/aggState.go | Updated vector creation to off-heap; added UnmarshalFromReader() to interface; refactored serialization |
| pkg/sql/colexec/aggexec/tool.go | Added UnmarshalFromReader() method; updated vector creation |
| pkg/container/types/encoding.go | Optimized Read*/Write* functions with fixed-size stack arrays and binary.LittleEndian |
| pkg/sql/colexec/aggexec/*.go | Removed deprecated marshal()/unmarshal() implementations across multiple files; updated tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merge Queue Status
This pull request spent 14 seconds in the queue, with no time running CI. Required conditions to merge
|
What type of PR is this?
Which issue(s) this PR fixes:
issue #23846
What this PR does / why we need it:
for TPC-H 100G with agg_spill_mem = 64M, the total allocated memory is reduced from 42G to 17G, and time cost is also slightly improved.