Skip to content

aggexec: reduce allocations in serialization paths and remove dead code#23873

Merged
mergify[bot] merged 10 commits intomatrixorigin:mainfrom
aunjgr:agg_spill
Mar 18, 2026
Merged

aggexec: reduce allocations in serialization paths and remove dead code#23873
mergify[bot] merged 10 commits intomatrixorigin:mainfrom
aunjgr:agg_spill

Conversation

@aunjgr
Copy link
Contributor

@aunjgr aunjgr commented Mar 18, 2026

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23846

What this PR does / why we need it:

  • encoding.go: replace heap-allocated slices in Read*/Write* helpers with stack-allocated fixed-size arrays; use encoding/binary directly
  • aggState: use NewOffHeapVecWithType for state vectors to avoid GC pressure
  • aggState.readState: replace ReadSizeBytes+UnmarshalBinary with LimitReader+ UnmarshalFromReader, eliminating per-mob intermediate []byte allocation
  • Add MarshalerUnmarshaler.UnmarshalFromReader; implement on bmp via roaring.ReadFrom
  • Add Vectors[T].UnmarshalFromReader using LimitReader+vector.UnmarshalWithReader, eliminating per-vector intermediate []byte allocation; use in median.go
  • Remove marshal()/unmarshal() from AggFuncExec interface and all implementations (dead code, production path is SaveIntermediateResult/UnmarshalFromReader)
  • Remove helpers only used by deleted methods: getEncoded(), getGroupContextEncodings(), aggExec panic stubs

for TPC-H 100G with agg_spill_mem = 64M, the total allocated memory is reduced from 42G to 17G, and time cost is also slightly improved.

- encoding.go: replace heap-allocated slices in Read*/Write* helpers with
  stack-allocated fixed-size arrays; use encoding/binary directly
- aggState: use NewOffHeapVecWithType for state vectors to avoid GC pressure
- aggState.readState: replace ReadSizeBytes+UnmarshalBinary with LimitReader+
  UnmarshalFromReader, eliminating per-mob intermediate []byte allocation
- Add MarshalerUnmarshaler.UnmarshalFromReader; implement on bmp via roaring.ReadFrom
- Add Vectors[T].UnmarshalFromReader using LimitReader+vector.UnmarshalWithReader,
  eliminating per-vector intermediate []byte allocation; use in median.go
- Remove marshal()/unmarshal() from AggFuncExec interface and all implementations
  (dead code, production path is SaveIntermediateResult/UnmarshalFromReader)
- Remove helpers only used by deleted methods: getEncoded(),
  getGroupContextEncodings(), aggExec panic stubs
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request is a refactoring focused on performance optimization and code consolidation. It removes deprecated interface methods from the aggregation executor framework, replaces in-heap vector/batch creation with off-heap alternatives for better memory efficiency, optimizes encoding operations with stack-allocated buffers, and refactors spill operations to reuse buffers and improve memory locality. Additionally, it renames the AllGroupHash() method to AppendAllGroupHash() to use a caller-provided buffer for better efficiency.

Changes:

  • Replace AllGroupHash() with AppendAllGroupHash() accepting pre-allocated buffer parameter for performance
  • Remove deprecated marshal() and unmarshal() methods from AggFuncExec interface and all implementations
  • Update vector/batch creation calls from NewVec()/NewWithSize() to NewOffHeapVecWithType()/NewOffHeapWithSize()
  • Optimize encoding/binary I/O operations with fixed-size stack-allocated buffers
  • Refactor spill operations to reuse buffers and improve memory efficiency through the new spill buffer fields

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/vm/message/joinMapMsg.go Removed deprecated AllGroupHash() method
pkg/container/hashtable/int64_hash_map.go Updated to AppendAllGroupHash() with buffer parameter
pkg/container/hashtable/string_hash_map.go Updated to AppendAllGroupHash() with buffer parameter
pkg/common/hashmap/types.go Updated interface definition for AppendAllGroupHash()
pkg/common/hashmap/inthashmap.go Updated wrapper to use new AppendAllGroupHash()
pkg/common/hashmap/strhashmap.go Updated wrapper to use new AppendAllGroupHash()
pkg/sql/colexec/group/types2.go Added reusable spill buffer fields and cleanup logic
pkg/sql/colexec/group/helper.go Refactored spillDataToDisk() to reuse buffers and use AppendAllGroupHash()
pkg/sql/colexec/multi_update/s3writer_delegate.go Updated vector/batch creation to off-heap variants
pkg/sql/colexec/evalProjection.go Updated batch creation to off-heap
pkg/sql/colexec/evalExpression.go Updated vector/batch creation to off-heap and added SetOffHeap() call
pkg/sql/colexec/dedupjoin/join.go Updated vector/batch creation to off-heap
pkg/sql/colexec/aggexec/window.go Removed deprecated marshal()/unmarshal() methods; updated vector creation
pkg/sql/colexec/aggexec/types.go Removed marshal() and unmarshal() from AggFuncExec interface
pkg/sql/colexec/aggexec/aggState.go Updated vector creation to off-heap; added UnmarshalFromReader() to interface; refactored serialization
pkg/sql/colexec/aggexec/tool.go Added UnmarshalFromReader() method; updated vector creation
pkg/container/types/encoding.go Optimized Read*/Write* functions with fixed-size stack arrays and binary.LittleEndian
pkg/sql/colexec/aggexec/*.go Removed deprecated marshal()/unmarshal() implementations across multiple files; updated tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mergify
Copy link
Contributor

mergify bot commented Mar 18, 2026

Merge Queue Status

  • Entered queue2026-03-18 11:21 UTC · Rule: main
  • Checks passed · in-place
  • Merged2026-03-18 11:21 UTC · at 7334c082bfc246be23a8d04027a7f1f4aa17692e

This pull request spent 14 seconds in the queue, with no time running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement kind/refactor Code refactor size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants