Skip to content

Commit f6e2e6b

Browse files
VladSaiocUbermknyszek
authored andcommitted
design: add design/74609-goroutine-leak-detection-gc.md
For golang/go#74609 Changes at CL 688335 Change-Id: I605c0d4aa88cd44f42300ebe476496744d93f9ce GitHub-Last-Rev: 49564e8 GitHub-Pull-Request: #58 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/689555 Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
1 parent 2c02d6b commit f6e2e6b

File tree

1 file changed

+154
-0
lines changed

1 file changed

+154
-0
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Proposal: Goroutine leak detection via garbage collection
2+
3+
Author(s): Georgian-Vlad Saioc ([email protected]), Milind Chabbi ([email protected])
4+
5+
Last updated: 14 Aug 2025
6+
7+
Discussion at [issue #74609](https://go.dev/issue/74609).
8+
9+
## Abstract
10+
11+
This proposal outlines a dynamic technique for detecting goroutine
12+
leaks within Go programs. It leverages the existing marking phase
13+
of the Go garbage collector (GC) to find goroutines blocked over
14+
concurrency primitives that are not reachable in memory from goroutines
15+
that may still be runnable.
16+
17+
## Background
18+
19+
Due to its concurrency features (lightweight goroutines,
20+
message passing), Go is particularly susceptible to concurrency bugs
21+
known as _goroutine leaks_ (also known as _partial deadlocks_ in
22+
literature [1](https://dl.acm.org/doi/10.1145/3676641.3715990)).
23+
Unlike global deadlocks (wherein all goroutines are blocked) that halt
24+
an entire application, goroutine leaks occur whenever a goroutine is
25+
blocked indefinitely, e.g., by reading from a channel that no other
26+
goroutine has access to, but other running goroutines keep the
27+
program operational.
28+
This issue can lead to (_a_) severe memory leaks, and (_b_) performance
29+
penalties, by over-burdening the GC with the task to mark useless memory.
30+
Goroutine leaks may be notoriously difficult to debug; in some cases
31+
even their presence alone is difficult to discern, even with otherwise
32+
thorough diagnostic information, e.g., memory and goroutine profiles.
33+
This makes tooling capable of detecting their presence valuable
34+
to the Go ecosystem.
35+
36+
## Proposal
37+
38+
The change involves several modifications to key points during phases
39+
of the GC cycle, as follows:
40+
1. Mark root preparation: initially treat only _runnable_ goroutines
41+
as mark roots (the regular GC treats _all_ goroutines as roots)
42+
2. Proceed to mark memory from this set of roots.
43+
3. Once all reachable memory has been marked, check whether any
44+
unmarked goroutines are blocked at operations over any concurrency
45+
primitives that have been marked as a result of step 2.
46+
4. Any such goroutines are considered _eventually runnable_, and
47+
must be treated as mark roots. Resume marking from step 2 with
48+
the new roots.
49+
5. Once a fixed point over reachable memory is computed, report any
50+
goroutines that are not treated as roots as leaks; resume from
51+
step 2 one last time with leaked goroutines as mark roots to ensure
52+
that all reachable memory is marked, like in the regular GC.
53+
6. Sweeping proceeds as normal.
54+
55+
For an additional in-depth description of the theoretical
56+
underpinnings, refer [here](https://dl.acm.org/doi/10.1145/3676641.3715990).
57+
58+
## Rationale
59+
60+
The proposal expands the developer toolset when it comes to identifying
61+
goroutine leaks, especially in long-running systems with complex
62+
non-deterministic behavior.
63+
The advantage of this approach over other goroutine leak detection
64+
techniques is that it can be leveraged, with a minimal performance
65+
cost, in regular Go systems, e.g., production services.
66+
It is also theoretically sound, i.e., there are no false positives.
67+
Its primary limitation is that its effectiveness is reduced the more
68+
heap resources are over-exposed in memory, i.e., pair-wise reachable.
69+
70+
## Compatibility
71+
72+
The feature is backwards-compatible with any Go program.
73+
Changes are strictly internal, and any extensions are only accessible
74+
on an opt-in basis via additional APIs, in this case by adding a
75+
new profile type.
76+
77+
## Implementation
78+
79+
A working prototype is available at [go.dev/cl/688335](https://go.dev/cl/688335).
80+
81+
In this section we discuss various aspects of the implementation.
82+
83+
### Opting in via profiling
84+
85+
Goroutine leak detection behaviour is
86+
triggered on-demand via profiling.
87+
An additional profile type, `"goroutineleak"`, is now available.
88+
Attempting to extract it will perform the following:
89+
90+
1. Queue a leak detecting GC cycle and wait for it to complete.
91+
2. Extract a goroutine profile.
92+
3. Filter for goroutines with a leaked status, if `debug < 2`;
93+
alternatively, get a full stack dump of all goroutines, if `debug >=2`.
94+
4. Output the results.
95+
96+
Otherwise, the GC preserves regular behavior, with a few exceptions
97+
described in the remainder of this section.
98+
99+
### Temporary experimental flag
100+
In order to avoid most performance penalties,
101+
the proposal is currently only enabled via the
102+
experimental flag `goleakprofiler`.
103+
104+
### Hiding pointers from the GC
105+
It is essential for the approach that certain pointers are only
106+
conditionally traced by the GC.
107+
In the current implementation, this is achieved via
108+
**maybe-traceable pointers**, expressed as type `maybeTraceablePtr`
109+
in the runtime.
110+
111+
A maybe-traceable pointer value is a pair between a
112+
`unsafe.Pointer` and `uintptr` value, stored at fields `.vp` and `.vu`,
113+
respectively, within the `maybeTraceablePtr` type.
114+
A maybe-traceable pointer has one of three states:
115+
116+
1) **Unset:** both `.vp` and `.vu` are zero values.
117+
This is homologous to `nil`.
118+
2) **Traceable:** both `.vp` and `.vu` are set, where both point to the
119+
same address.
120+
3) **Untraceable:** `.vu` is set to the address that is referenced, but
121+
`.vp` is set
122+
to `nil`, such that the GC does not automatically trace it when
123+
scanning the object embedding the maybe-traceable pointer.
124+
125+
Maybe-traceable pointers are then provided with a set of methods for
126+
setting and unsetting them, that guarantee certain invariants at
127+
runtime, e.g., that if `.vp` and `.vu` are set, they point to the
128+
same address.
129+
130+
The use of maybe-traceable pointers is only required for `*sudog`
131+
objects, specifically for the `.elem` and `.hchan` fields.
132+
This prevents the GC from inadvertendly marking channels that have
133+
not yet been deemed reachable in memory via eventually runnable
134+
goroutines.
135+
This may occur because `*sudog` objects are globally reachable: via
136+
the list of goroutine objects (`*g`) at `allgs`, and via the treap
137+
forest of semaphore-related `*sudog`s at `semtable`.
138+
139+
All uses of these fields have been updated with the methods provided
140+
by the `maybeTraceablePtr` type.
141+
When a goroutine leak detection GC cycle starts, it sets all
142+
maybe-traceable pointers in `*sudog` objects as untraceable.
143+
Once the cycle concludes, it resets all the pointers to being traceable.
144+
145+
### Soft dependency on [go.dev/issue/27993](https://go.dev/issue/27993)
146+
In the current implementation of the GC, there is a check for whether
147+
marking phase must be restarted due to
148+
[go.dev/issue/27993](https://go.dev/issue/27993).
149+
We extend that checkpoint with additional logic: (1) to find
150+
additional eventually-runnable goroutines, or (2) to mark goroutines as
151+
leaked, both of which provide another reason to restart
152+
the marking phase.
153+
Even if #27993 is resolved, the checkpoint must be preserved
154+
for goroutine leak detection.

0 commit comments

Comments
 (0)