Skip to content

MB-71397: Add a maximal-effort strategy to avoid OOM errors for GPU indexes.#61

Draft
CascadingRadium wants to merge 6 commits into
fixSizefrom
fixOOM
Draft

MB-71397: Add a maximal-effort strategy to avoid OOM errors for GPU indexes.#61
CascadingRadium wants to merge 6 commits into
fixSizefrom
fixOOM

Conversation

@CascadingRadium
Copy link
Copy Markdown
Member

@CascadingRadium CascadingRadium commented May 8, 2026

  • Serialize GPU clones per device via a new gpuSerializer (per-device mutex), preventing concurrent CloneToGPU calls from oversubscribing the same GPU.
  • Enforce memory check at clone time with a fresh getFreeGPUMemory reading taken under the scheduler lock, fixing the Time-of-Check to Time-of-Use (TOCTOU) race condition where stale load-balancer readings let concurrent callers all pass the threshold.
  • Use an estimate of the amount of memory required on the GPU for the clone operation, to ensure that post the clone operation we have atleast defaultGPUMinFreeMemory memory available for the sake of temporary allocations during the index lifetime.
  • Requires:
  • Debug logs comparing the estimated memory required on the GPU pre-clone and the actual memory used on the GPU post the clone operation.
---------------------------------
clone gpu=0   pre=  5646 MB  est=    20 MB  post=  5506 MB  actual=   140 MB  diff=+120 MB
clone gpu=0   pre=  5506 MB  est=    20 MB  post=  5498 MB  actual=     8 MB  diff=-12 MB
clone gpu=0   pre=  5498 MB  est=    41 MB  post=  5486 MB  actual=    12 MB  diff=-29 MB
clone gpu=0   pre=  5486 MB  est=   184 MB  post=  5214 MB  actual=   272 MB  diff=+88 MB
clone gpu=0   pre=  5214 MB  est=    20 MB  post=  5204 MB  actual=    10 MB  diff=-10 MB
clone gpu=0   pre=  5204 MB  est=    41 MB  post=  5194 MB  actual=    10 MB  diff=-31 MB
clone gpu=0   pre=  5194 MB  est=    82 MB  post=  5052 MB  actual=   142 MB  diff=+60 MB
clone gpu=0   pre=  5052 MB  est=    20 MB  post=  5044 MB  actual=     8 MB  diff=-12 MB
clone gpu=0   pre=  5044 MB  est=   204 MB  post=  4772 MB  actual=   272 MB  diff=+68 MB
clone gpu=0   pre=  4772 MB  est=   184 MB  post=  4500 MB  actual=   272 MB  diff=+88 MB
clone gpu=0   pre=  4500 MB  est=   204 MB  post=  4226 MB  actual=   274 MB  diff=+70 MB
clone gpu=0   pre=  4226 MB  est=   204 MB  post=  4082 MB  actual=   144 MB  diff=-60 MB
clone gpu=0   pre=  4082 MB  est=   204 MB  post=  3808 MB  actual=   274 MB  diff=+70 MB
clone gpu=0   pre=  3808 MB  est=   204 MB  post=  3536 MB  actual=   272 MB  diff=+68 MB
clone gpu=0   pre=  3536 MB  est=   204 MB  post=  3260 MB  actual=   276 MB  diff=+72 MB
clone gpu=0   pre=  3260 MB  est=   204 MB  post=  2988 MB  actual=   272 MB  diff=+68 MB
clone gpu=0   pre=  2988 MB  est=  1994 MB  post=   900 MB  actual=  2088 MB  diff=+94 MB
clone gpu=0   pre=   900 MB  est=  1994 MB  REJECTED (free < est + 512 MB buffer)
clone gpu=0   pre=   900 MB  est=  1994 MB  REJECTED (free < est + 512 MB buffer)
clone gpu=0   pre=   900 MB  est=  1994 MB  REJECTED (free < est + 512 MB buffer)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces per-device serialization for GPU cloning and adds a clone-time GPU free-memory admission check to reduce GPU oversubscription and mitigate TOCTOU issues when multiple goroutines clone indexes concurrently.

Changes:

  • Add a per-GPU gpuScheduler (per-device mutex) and use it to serialize CloneToGPU operations on the same device.
  • Centralize GPU free-memory querying via getFreeGPUMemory() and reuse it in the load balancer and clone path.
  • Update GPU selection/init logic to only start the load balancer when multiple GPUs are available, while always enabling the scheduler when any GPU exists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gpu.go
Comment thread gpu.go
Comment thread gpu.go
Comment thread gpu.go
Comment thread gpu.go Outdated
CascadingRadium and others added 3 commits May 11, 2026 18:04
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
@CascadingRadium CascadingRadium changed the title Add a GPU Scheduler Add a GPU Serializer May 11, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread gpu.go
Comment thread gpu.go
Comment thread gpu.go
@CascadingRadium CascadingRadium changed the title Add a GPU Serializer MB-71397: Add heuristic based checks for maximal-effort strategy to avoid OOMs May 14, 2026
@CascadingRadium CascadingRadium changed the title MB-71397: Add heuristic based checks for maximal-effort strategy to avoid OOMs MB-71397: Add a maximal-effort strategy to avoid OOM errors for GPU indexes. May 14, 2026
@CascadingRadium CascadingRadium marked this pull request as draft May 15, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants