Plan: Selector Painting Fix for Issue #4102 #4943
Replies: 2 comments
-
Update: Test Verification StrategyThe full-program test from step 5 checks that ORCA reference counts balance correctly when a behavior is called through a trait with different parameter capabilities. However, PR #4256 ("Fix segfault caused by unsafe garbage collection optimization") is currently masking the bug: To verify the fix works independently of #4256's safety net, we use this workflow:
The Also removed step 6 ("Verify with the issue's minimal repro") from the original plan — the above strategy provides a more rigorous and reproducible verification than manual memory observation. |
Beta Was this translation helpful? Give feedback.
-
|
Implemented in PR #4944. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Implementation plan for fixing #4102 — memory leak when a
valparameter implements anisoparameter trait.Problem Summary
When a behavior is called through a trait, and the concrete actor's parameter has a
different trace-significant capability (e.g., trait has
iso, concrete hasval), theORCA GC reference counting is broken. The sender traces as MUTABLE (iso) but the
receiver's dispatch case traces as IMMUTABLE (val). This mismatch causes objects to
never reach RC=0, leaking memory.
Root cause: The mangled name for methods (used by the paint/selector-coloring
algorithm to assign vtable indices) doesn't include parameter capabilities.
String isoand
String valshare the samereach_type_t(forString) and thus the same mangle(
"o"). This means:r_mangled→early return in
add_rmethod_to_subtype)→ same paint color)
MUTABLE)
Approach
jemc's selector painting proposal: give distinct message IDs (vtable indices) to methods
whose parameters differ in trace-significant ways. This means the trait's
isoversionand the concrete's
valversion get separate vtable indices and separate dispatch cases,each tracing correctly.
Trace-Significant Capability Buckets
Only three distinctions matter (matching the three
PONY_TRACE_*values):PONY_TRACE_MUTABLEPONY_TRACE_IMMUTABLEPONY_TRACE_OPAQUEFor behavior parameters, only sendable caps are allowed: iso, val, tag. So effectively:
iso→m, val→i, tag→t.
Actors are always traced as opaque regardless of cap. Primitives don't need tracing.
Why Only the Inline Send Path Is Broken
When
can_inline_message_send()returns false, the call goes through the vtable to theforwarding sender function. The forwarding sender delegates entirely to the concrete
sender (calling it directly), which traces with the concrete method's params and sends a
message using the concrete vtable index. The dispatch case for the concrete method then
traces with the concrete params. Both sides match — no ORCA mismatch.
The bug only manifests on the inline send path: the call site directly allocates the
message and traces using the trait's method params (e.g., iso → MUTABLE), then sends with
the trait's vtable index. But the dispatch case at that index was generated from the
concrete method's params (val → IMMUTABLE). The fix ensures the forwarding method gets its
own vtable index and its own dispatch case that traces consistently with the sender.
Changes
1.
make_mangled_name()— Include trace-significant capabilityFile:
src/libponyc/reach/reach.cCurrently generates:
cap_name_<param1_type_mangle><param2_type_mangle>...<result_mangle>Change to include a trace-kind character for each parameter:
cap_name_<trace_kind1><param1_type_mangle><trace_kind2><param2_type_mangle>...<result_mangle>Where
trace_kindis computed from the parameter's AST:Then in
make_mangled_name:Impact:
mangled_nameis purely internal (used only forr_mangledhash lookup andthe paint algorithm). It does NOT appear in LLVM IR or symbol names.
full_name(whichdoes appear in symbols) is derived from
mangled_nameviamake_full_name(), so symbolnames will change for methods whose params have non-default trace kinds. This is harmless
and actually helpful for debugging.
When caps match (the common case), the forwarding method's mangled name still collides
with the concrete method's, so no extra vtable entries are created. Extra entries only
appear when there's an actual trace-significant mismatch.
2.
add_rmethod_to_subtype()— No code changes neededFile:
src/libponyc/reach/reach.cWith the new mangled names, the
reach_mangled_getwill naturally NOT findthe concrete method when the caps differ (different mangled names). The forwarding method
will be created with the trait's params (iso caps) and its own vtable index.
When caps match, the mangled names still collide, and the early return still
fires. No behavioral change for the common case.
3.
genfun_forward()— Add dispatch cases for forwarding behaviorsFile:
src/libponyc/codegen/genfun.cCurrently
genfun_forwardgenerates only a forwarding sender function. For forwardingbehaviors (TK_BE) and actor constructors (TK_NEW on actors), we also need to add a
dispatch case.
After the existing sender generation code, add:
Key parameter choices for
add_dispatch_case:params = m->params: the forwarding method's params (trait's caps) — so tracingmatches what the sender did
index = m->vtable_index: the forwarding method's vtable index — matches the messageID the sender used
handler = c_m2->func_handler: the concrete method's handler — the actual behaviorbody to execute
fun_type = c_m2->func_type: the concrete handler's LLVM function type — forparameter extraction in dispatch
msg_type = c_m->msg_type: the forwarding method's message type — matches themessage structure the sender allocated
The LLVM types for
String isoandString valare identical (both pointers), so thefun_type and msg_type are interchangeable between the two methods. Using the forwarding
method's msg_type is conceptually correct (it matches the sender's allocation).
4.
genfun_method_bodies()— Two-pass iteration for orderingFile:
src/libponyc/codegen/genfun.cProblem:
genfun_forwardneeds the concrete method'sfunc_handlerto exist.Technically, LLVM function declarations are created during
genfun_method_sigs()(whichruns before
genfun_method_bodies()), so the declaration exists regardless of processingorder. However,
genfun_method_bodiesiteratesr_mangledin hashmap order, which isarbitrary. As a defensive measure, we split into two passes to guarantee concrete handlers
are fully generated before forwarding methods reference them.
Fix: Split the inner loop into two passes — non-forwarding methods first, forwarding
methods second:
This ensures all concrete handlers are generated before any forwarding dispatch cases
that reference them.
5. Test: Full-program trace verification
Directory:
test/full-program-tests/codegen-trace-iso-to-val-through-trait/Create a test following the existing trace test pattern
(
codegen-trace-object-different-capas reference). The test:B(with a field, so it's traceable)trait tag Receiverwithbe receive(b: B iso)actor Main is Receiverwithbe receive(b: B val)create, calls the behavior through a trait-typed variable to exercise theforwarding path:
let r: Receiver = this; r.receive(recover B end). Calling directlyon
thiswould use the concrete method and bypass the vtable dispatch entirely.objectmap_has_object_rcexpected-exit-code.txt:
1Uses the shared FFI library at
test/full-program-tests/codegen-trace/additional.cc.The test source needs
use "lib:codegen-trace-additional"to link against it.6. Verify with the issue's minimal repro
Compile and run the program from the issue body (the Timer-based one) and verify memory
stabilizes instead of growing unboundedly. This is a manual verification step, not an
automated test.
7. Run existing test suite
Edge Cases
Unions
jemc's example:
Both union types get
mangle = "o"(unions always have mangle "o"). With the simplenominal
trace_kind_char, both would get'x'(default for non-nominal types),producing the same mangled name. The forwarding method would NOT be created, and the bug
would persist for this case.
Fix for unions:
trace_kind_charmust recursively walk union members. For a union,compute a sorted string of member trace kinds:
This is more complex but necessary for correctness. The encoding must be deterministic
(sorted by some canonical order) so that equivalent unions produce the same string.
Recommendation: Implement the simple nominal case first, then add union/tuple
handling. The simple case covers the reported bug and the most common pattern. Union/tuple
parameter mismatches are valid but much rarer.
Tuples
Similar to unions. Tuple mangling already includes member type mangles
(
<cardinality><mangle1><mangle2>...), but not member capabilities. Thetrace_kind_charfunction would need to handleTK_TUPLETYPEby recursively encodingeach element's trace kind.
Intersections
Same pattern as unions. Walk all members and encode their trace kinds.
Type-parameterized parameters (
TK_TYPEPARAMREF)set_method_types()can produceTK_TYPEPARAMREFnodes inparams[i].ast. Thetrace_kind_charfunction returns'x'for these (the default case). This is safe: boththe trait and concrete method would get
'x'for the same type-parameterized parameter,so mangled names still match and no spurious forwarding methods are created. However, it
means this fix does NOT cover trace-kind mismatches on type-parameterized parameters. In
practice, methods are monomorphized during reach analysis, so
TK_TYPEPARAMREFnodesshould not survive to this point for fully-specialized methods. This is an acknowledged
limitation — if a real case surfaces,
trace_kind_charcan be extended to handle it.Actor constructors through interfaces
The same issue can apply to
newon actors called through interfaces. The fix ingenfun_forwardhandles this by checking for bothTK_BEandTK_NEWon actor types.Non-behavior methods
Functions (
TK_FUN) don't have dispatch cases or message tracing. The mangled namechange technically affects them too (they'd include trace kind characters), but since no
forwarding dispatch cases are added for functions, this has no behavioral impact — just
slightly different internal names.
To minimize the change, we could limit the trace kind encoding to behaviors only:
However, including it for all methods is simpler and harmless. The only cost is slightly
larger vtables when a non-behavior function has forwarding methods with different caps —
which is extremely rare and has no correctness impact.
Files Modified
src/libponyc/reach/reach.ctrace_kind_char(), modifymake_mangled_name()src/libponyc/codegen/genfun.cgenfun_forward()to add dispatch cases; modifygenfun_method_bodies()for two-pass iterationtest/full-program-tests/codegen-trace-iso-to-val-through-trait/main.ponytest/full-program-tests/codegen-trace-iso-to-val-through-trait/expected-exit-code.txt1Beta Was this translation helpful? Give feedback.
All reactions