Conversation
…ate (#395) * fix: resolve data race on poolCommon.once field using atomic.Pointer The data race occurred between Reboot() writing p.once = &sync.Once{} and worker goroutines reading p.once.Do(...) concurrently. Fix: change the once field from *sync.Once to atomic.Pointer[sync.Once] so all reads (Load) and writes (Store) are atomic. Agent-Logs-Url: https://github.com/panjf2000/ants/sessions/b2f67c6c-ed78-4758-9185-643b3e09d143 Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com> * fix: make Reboot() wait for all workers to exit before resetting pool The root cause of the data race is that Reboot() resets pool state (once, allDone) while workers are still shutting down and accessing those same fields. Fix: Reboot() now waits on the allDone channel (which is closed when the last worker exits) before resetting pool state. This ensures no worker goroutine is running when the fields are reassigned, eliminating the data race without needing atomic wrappers. Reverts the atomic.Pointer[sync.Once] approach in favor of this fundamental fix. Agent-Logs-Url: https://github.com/panjf2000/ants/sessions/7b1d6e25-546a-40ef-b31d-33e824186b31 Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com> * fix: also wait for purge/ticktock goroutines to exit in Reboot() Ensures the old purge and ticktock goroutines have fully exited (set their done flags) before Reboot() resets those flags and starts new goroutines. This prevents a logic race where the old goroutine could set purgeDone/ticktockDone=1 after Reboot() resets them to 0. Agent-Logs-Url: https://github.com/panjf2000/ants/sessions/480234c8-a50f-4dc3-9352-0b337ebfc5d3 Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com>
* fix: use atomic.Pointer for once field to prevent data race in Reboot
The data race occurred between Reboot() writing p.once = &sync.Once{}
and worker goroutines reading p.once to call once.Do(close(allDone)).
By using atomic.Pointer[sync.Once], all reads (Load) and writes (Store)
to the once field are properly synchronized.
Agent-Logs-Url: https://github.com/panjf2000/ants/sessions/4a1d1b13-0a7f-4167-ae12-429e7719ea32
Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com>
* fix: use sync.WaitGroup to ensure all worker goroutines fully exit before Reboot
Revert the superficial atomic.Pointer fix. The real issue is that
Reboot() proceeded after allDone (running count hit 0) while worker
goroutines were still executing deferred cleanup code.
Add workerWg sync.WaitGroup to poolCommon. Each worker goroutine calls
wg.Add(1) before spawning and defer wg.Done() as the outermost defer
(runs last). Reboot() calls workerWg.Wait() after <-allDone to block
until every goroutine has fully exited before resetting pool state.
Agent-Logs-Url: https://github.com/panjf2000/ants/sessions/758f404a-fab6-4a4e-8f42-ea4fdfd2ab9a
Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com>
* fix: call workerWg.Add(1) before addRunning(1) for defensive ordering
Agent-Logs-Url: https://github.com/panjf2000/ants/sessions/758f404a-fab6-4a4e-8f42-ea4fdfd2ab9a
Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com>
* fix: move once.Do(close(allDone)) to Release() instead of Reboot()
This is the fundamental fix: Release() is the proper place to close
allDone when no workers are running, since it's the operation that
transitions the pool to CLOSED state. If workers are still running,
the last one to exit closes allDone in its defer as before.
This eliminates the race between Reboot() and running workers on
once.Do, and removes the unnecessary workerWg complexity.
Agent-Logs-Url: https://github.com/panjf2000/ants/sessions/0d78b737-d4ef-4dbd-b021-3e532757c1f9
Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: panjf2000 <7496278+panjf2000@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #398 +/- ##
==========================================
- Coverage 96.81% 95.11% -1.70%
==========================================
Files 14 14
Lines 784 798 +14
==========================================
Hits 759 759
- Misses 19 26 +7
- Partials 6 13 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This patch release tightens shutdown/reboot behavior in the core pool implementation, improves resource cleanup during MultiPool construction failures, and applies a few small correctness/documentation tweaks.
Changes:
- Adjusts pool shutdown/reboot coordination (closing
allDoneearlier;Reboot()now waits for worker/purge/ticktock completion). - Prevents resource leaks in
NewMultiPool*constructors by releasing already-created pools on partial construction failure; replaces sentinel int constant withmath.MaxInt32. - Minor refactors/docs updates (loop-queue binary search local vars; README sponsorship badge;
WithNonblockingcomment).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
ants.go |
Changes pool lifecycle coordination in Release()/Reboot()/ReleaseContext() to avoid hangs and reduce races. |
multipool.go |
Releases already-created pools on constructor error; uses math.MaxInt32 in LeastTasks selection. |
multipool_func.go |
Same constructor cleanup + math.MaxInt32 update for func-based multipool. |
multipool_func_generic.go |
Same constructor cleanup + math.MaxInt32 update for generic func-based multipool. |
worker_loop_queue.go |
Refactors binarySearch locals for clarity/scope reduction. |
options.go |
Fixes WithNonblocking comment to match actual error behavior (ErrPoolOverload). |
README.md |
Updates DigitalOcean sponsorship badge/link format. |
README_ZH.md |
Updates DigitalOcean sponsorship badge/link format (Chinese README). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| p.goPurge() | ||
| atomic.StoreInt32(&p.ticktockDone, 0) | ||
| p.goTicktock() | ||
| p.allDone = make(chan struct{}) | ||
| p.once = &sync.Once{} |
| // Reboot reboots a closed pool, it does nothing if the pool is not closed. | ||
| // If you intend to reboot a closed pool, use ReleaseTimeout() instead of | ||
| // Release() to ensure that all workers are stopped and resource are released | ||
| // before rebooting, otherwise you may run into data race. |
| runtime.Gosched() | ||
| } | ||
| } | ||
| for atomic.LoadInt32(&p.ticktockDone) != 1 { | ||
| runtime.Gosched() |
No description provided.