MB-71397: Add a maximal-effort strategy to avoid OOM errors for GPU indexes. by CascadingRadium · Pull Request #61 · blevesearch/go-faiss

CascadingRadium · 2026-05-08T15:20:42Z

Serialize GPU clones per device via a new gpuSerializer (per-device mutex), preventing concurrent CloneToGPU calls from oversubscribing the same GPU.
Enforce memory check at clone time with a fresh getFreeGPUMemory reading taken under the scheduler lock, fixing the Time-of-Check to Time-of-Use (TOCTOU) race condition where stale load-balancer readings let concurrent callers all pass the threshold.
Use an estimate of the amount of memory required on the GPU for the clone operation, to ensure that post the clone operation we have atleast defaultGPUMinFreeMemory memory available for the sake of temporary allocations during the index lifetime.
Requires:
Debug logs comparing the estimated memory required on the GPU pre-clone and the actual memory used on the GPU post the clone operation.

---------------------------------
clone gpu=0   pre=  5646 MB  est=    20 MB  post=  5506 MB  actual=   140 MB  diff=+120 MB
clone gpu=0   pre=  5506 MB  est=    20 MB  post=  5498 MB  actual=     8 MB  diff=-12 MB
clone gpu=0   pre=  5498 MB  est=    41 MB  post=  5486 MB  actual=    12 MB  diff=-29 MB
clone gpu=0   pre=  5486 MB  est=   184 MB  post=  5214 MB  actual=   272 MB  diff=+88 MB
clone gpu=0   pre=  5214 MB  est=    20 MB  post=  5204 MB  actual=    10 MB  diff=-10 MB
clone gpu=0   pre=  5204 MB  est=    41 MB  post=  5194 MB  actual=    10 MB  diff=-31 MB
clone gpu=0   pre=  5194 MB  est=    82 MB  post=  5052 MB  actual=   142 MB  diff=+60 MB
clone gpu=0   pre=  5052 MB  est=    20 MB  post=  5044 MB  actual=     8 MB  diff=-12 MB
clone gpu=0   pre=  5044 MB  est=   204 MB  post=  4772 MB  actual=   272 MB  diff=+68 MB
clone gpu=0   pre=  4772 MB  est=   184 MB  post=  4500 MB  actual=   272 MB  diff=+88 MB
clone gpu=0   pre=  4500 MB  est=   204 MB  post=  4226 MB  actual=   274 MB  diff=+70 MB
clone gpu=0   pre=  4226 MB  est=   204 MB  post=  4082 MB  actual=   144 MB  diff=-60 MB
clone gpu=0   pre=  4082 MB  est=   204 MB  post=  3808 MB  actual=   274 MB  diff=+70 MB
clone gpu=0   pre=  3808 MB  est=   204 MB  post=  3536 MB  actual=   272 MB  diff=+68 MB
clone gpu=0   pre=  3536 MB  est=   204 MB  post=  3260 MB  actual=   276 MB  diff=+72 MB
clone gpu=0   pre=  3260 MB  est=   204 MB  post=  2988 MB  actual=   272 MB  diff=+68 MB
clone gpu=0   pre=  2988 MB  est=  1994 MB  post=   900 MB  actual=  2088 MB  diff=+94 MB
clone gpu=0   pre=   900 MB  est=  1994 MB  REJECTED (free < est + 512 MB buffer)
clone gpu=0   pre=   900 MB  est=  1994 MB  REJECTED (free < est + 512 MB buffer)
clone gpu=0   pre=   900 MB  est=  1994 MB  REJECTED (free < est + 512 MB buffer)

Copilot

Pull request overview

This PR introduces per-device serialization for GPU cloning and adds a clone-time GPU free-memory admission check to reduce GPU oversubscription and mitigate TOCTOU issues when multiple goroutines clone indexes concurrently.

Changes:

Add a per-GPU gpuScheduler (per-device mutex) and use it to serialize CloneToGPU operations on the same device.
Centralize GPU free-memory querying via getFreeGPUMemory() and reuse it in the load balancer and clone path.
Update GPU selection/init logic to only start the load balancer when multiple GPUs are available, while always enabling the scheduler when any GPU exists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <copilot@github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

CascadingRadium requested review from a team and Copilot May 8, 2026 15:26

Copilot started reviewing on behalf of CascadingRadium May 8, 2026 15:26 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread gpu.go

Comment thread gpu.go

Comment thread gpu.go

Comment thread gpu.go

capemox requested changes May 11, 2026

View reviewed changes

Comment thread gpu.go Outdated

CascadingRadium and others added 3 commits May 11, 2026 18:04

Add gpu locks

bd99732

temp

b8856fd

Co-authored-by: Copilot <copilot@github.com>

remove debug logs

7d33583

Co-authored-by: Copilot <copilot@github.com>

CascadingRadium force-pushed the fixOOM branch from 175e391 to 7d33583 Compare May 11, 2026 12:34

scheduler -> serializer

8c037af

Co-authored-by: Copilot <copilot@github.com>

CascadingRadium changed the title ~~Add a GPU Scheduler~~ Add a GPU Serializer May 11, 2026

CascadingRadium requested review from a team, Likith101, Samsonnyyeet, Thejas-bhat, capemox, Copilot, maneuvertomars and steveyen May 11, 2026 12:48

Copilot started reviewing on behalf of CascadingRadium May 11, 2026 12:49 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread gpu.go

Comment thread gpu.go

Comment thread gpu.go

fix indexing OOM

1d82be2

CascadingRadium changed the title ~~Add a GPU Serializer~~ MB-71397: Add heuristic based checks for maximal-effort strategy to avoid OOMs May 14, 2026

CascadingRadium changed the title ~~MB-71397: Add heuristic based checks for maximal-effort strategy to avoid OOMs~~ MB-71397: Add a maximal-effort strategy to avoid OOM errors for GPU indexes. May 14, 2026

fix gpu_stub

cee2742

CascadingRadium marked this pull request as draft May 15, 2026 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MB-71397: Add a maximal-effort strategy to avoid OOM errors for GPU indexes.#61

MB-71397: Add a maximal-effort strategy to avoid OOM errors for GPU indexes.#61
CascadingRadium wants to merge 6 commits into
fixSizefrom
fixOOM

CascadingRadium commented May 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CascadingRadium commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CascadingRadium commented May 8, 2026 •

edited

Loading