feat(types): scope-based receiver-type call resolution for all 13 languages (SCIP base case) by andylbrummer · Pull Request #10 · standardbeagle/lci-cpp

andylbrummer · 2026-06-17T19:37:59Z

What

Call-graph method calls now resolve by the receiver's type, not just the method-name string. Same-named methods across classes (run/ServeHTTP/Close/String…) attribute to the correct symbol instead of collapsing onto the first match. SCIP base case — no type checker, no generics instantiation, no flow analysis.

Covers all 13 languages. Stacked on #9 (base branch fix-mcp-getcontext-name); retarget to main once #9 merges. The 5 type-resolution commits:

Go, Python, JS/TS (phases 1–3) — languages that already had a call graph.
C/C++ (af9f738) — added Class scope for named struct/class/union, this/typed-local env, qualified emission.
The remaining 7 (635b746) — Java, C#, Rust, PHP, Kotlin, Ruby, Zig.

The key finding

7 of 13 languages (Java, C#, Rust, PHP, Kotlin, Ruby, Zig) emitted zero call references — they had no call graph at all, so there was nothing to type-resolve. This PR builds the call graph and the receiver-type env for each, in the single extraction pass. Kotlin was worse: its fieldless tree-sitter grammar broke symbol extraction entirely (zero symbols) — fixed with a fieldless-name fallback.

How it works (write-path only; reads stay lock-free RCU)

Local type env (local_var_types_, per function): this/self/$this → enclosing class; typed params; T x / new T() / T::new() / T{} / T.new locals.
Qualified emission: a method call recv.m() whose receiver type is known emits a ref named Type.m (tagging the method-name node, not the un-resolvable recv.m selector).
Resolver (resolve_reference_target): splits the dotted name, picks the candidate whose owning type matches via symbol_matches_receiver_type — Go parses the receiver from the signature; class-based languages match the owning type in scope_chain. Unknown/dynamic receiver → bare-name fallback. Never a fabricated edge.

Per-language logic isolated in process_<lang>_reference. Class-scope prerequisites added where missing so the resolver can match an owning type: C/C++ specifiers, Rust impl_item/struct_item, Zig const A = struct{}.

Honest base-case limits (documented in the design doc)

Ruby: a bare no-receiver, no-paren call (help_a) parses as identifier, not call, so it is not emitted as an edge. Receiver calls (a.run, self.help_a) and T.new-typed locals resolve.
Kotlin/Zig: val a = A() / const a = A{} constructor calls emit as a bare Call on the type name (shows construction; resolves to the type).

Verification

Controlled corpus per language: go() resolves a.run()/b.run() to the distinct run of each class (previously both collapsed onto the first same-named symbol).
Unit: ScopeTypeResolution.* (7 langs, extraction-level qualified-ref assertions) + ReferenceTrackerTest.ResolvesByReceiverTypeScope (resolver-level). Full suite 1700/1700.
Integration goldens: 128/128 — no drift from the broad scope/symbol changes.

🤖 Generated with Claude Code

Resolve method-call targets by the receiver's locally-known TYPE instead of name-string only, so x.M() on a common method name resolves to the exact method (the SCIP base case), not an arbitrary same-named symbol. Mechanism (write-path only; reads stay lock-free RCU): - Per-function local type env {name -> type}, built syntactically in the Go extractor from the receiver, typed params, `var x T`, and `x := T{}/&T{}`. Cleared per function; closures inherit the enclosing env. - A method call `recv.M()` whose receiver type is known is emitted as a receiver-type-qualified ref `Type.M`. - resolve_reference_target: a dotted `Type.M` resolves to the method named M whose receiver/owning type is Type (Go receiver parsed from signature; class langs matched via scope_chain — ready for later phases). Unknown/dynamic receiver falls back to the existing name-based path (degrades to a candidate, honest — same as gopls/SCIP for interface dispatch). Verified: - Controlled 2-type corpus: A.Do->a.helpA resolves to A.helpA, B.Do->b.helpB to B.helpB (no same-name collision); callers correct. - chi: param-typed receivers resolve (r *http.Request -> r.Context() -> Request.Context); field-access chains (mx.pool.Get) degrade to bare name. - Full unit 1692/1692; integration 128/128; no golden regen (synthetic golden corpus has no typed method calls). Design + per-language rollout: docs/plans/2026-06-17-scope-type-resolution.md. Phases 2-4 (Java/C#/TS, Python/Rust, JS/C++/Kotlin/PHP/Ruby/Zig) follow this template per language. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Extend the SCIP-base-case type resolution to Python, and fix the same method-call caller gap Go had: process_python_reference tagged a method call's name as the un-resolvable "obj.M" Call + a Usage on "M", so Python methods had no callers. Now tag the attribute (method name) as the Call, qualified to "Type.M" when the receiver type is known. Local type env (UNAMBIGUOUS sources only — `x = Foo()` is skipped because constructor vs factory call is syntactically identical in Python): - self / cls -> enclosing class (via enclosing_class_name() over the scope stack; resolver matches the class through scope_chain). - annotated params `def m(self, x: T)` and annotated assignments `x: T`. py_bare_type strips quotes (string annotations), subscripts (List[Foo]->List), and module qualifiers. Verified: - Controlled 2-class corpus: A.do/self.helpA() -> A.helpA, B.do -> B.helpB (no same-name collision). - Full unit 1692/1692; MCP goldens 14/14 clean (earlier batch failures were the pre-existing MCP-readiness flake under load — 73ms pre-index responses — and reproduce on the Go phase too; get_context passes at 5082ms unloaded). Reused for all class-based languages: enclosing_class_name() + scope_chain receiver matching. Next: TS/JS, Java, C#, Rust, C++, Kotlin, PHP, Ruby, Zig. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Extend SCIP-base-case type resolution to JavaScript/TypeScript and fix the same method-call caller gap: a call obj.M() tagged the un-resolvable "obj.M" member_expression as the Call (method name M only got a Usage), so JS/TS methods had no callers. Now tag the PROPERTY (M) as the Call, qualified to "Type.M" when the receiver type is known. Local type env: - this -> enclosing class (enclosing_class_name()). - TS-annotated params `(x: T)` and variable annotations `const x: T`. - `new T()` constructor inference (`const x = new T()` -> x: T) — unambiguous in JS/TS unlike Python. js_bare_type strips ": ", generics (Foo<Bar>->Foo), array suffix, qualifier. Verified: - Controlled TS 2-class corpus: A.do/this.helpA()->A.helpA, B.do->B.helpB; run(): const a:A=new A(); a.do()->A.do and const b=new B(); b.do()->B.do (annotation + new() inference both resolve). - Full unit 1692/1692; trpc TS real-project 4/4. (MCP batch goldens flake on the pre-existing MCP-readiness race under load — 73ms pre-index responses; pass at ~5s when given time; multi-lang corpus has no JS/TS file so this change cannot affect them.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e case) Phase 4 of scope-type resolution. C/C++ method calls now resolve by the receiver's TYPE, not just the method name, so same-named methods across different classes (run/ServeHTTP/Close/…) attribute to the correct symbol. - process_scope_node: a named struct_specifier/class_specifier/union_specifier *with a body* opens a Class scope named after the aggregate. This gives member methods an owning-class entry in their scope_chain, which the resolver matches against a scope-typed `T.m` ref. Bodyless forms (forward decls, `struct A a;` uses) are excluded so they don't nest the surrounding scope. - process_reference_node (cpp branch): builds a per-function local var->type env (this -> enclosing class; `T x;` / `T x = ...` decls) and emits field-call refs as receiver-type-qualified `Type.m` when the receiver type is known. Unknown receivers fall back to the bare name (today's behavior). - Relocated go/js/py bare-type helpers to the top anon namespace so the cpp branch can use go_bare_type (was defined after the use site). Verified on a controlled corpus: go() -> {A.run -> A.helpA, B.run -> B.helpB} resolves both edges distinctly (previously both collapsed onto A.run). Added ReferenceTrackerTest.ResolvesByReceiverTypeScope. Full unit suite 1693/1693 green; no regressions from the new C/C++ class scopes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e remaining 7 languages Java, C#, Rust, PHP, Kotlin, Ruby, and Zig previously emitted ZERO call references — they had no call graph at all, so there was nothing to type-resolve. This adds, in the single extraction pass, both the call-reference extraction and the SCIP-base-case receiver-type env for each: - process_<lang>_reference: emit method/function calls as ReferenceType::Call, tagging the method-name node (not the un-resolvable receiver.method selector), and qualify to "Type.method" when the receiver's type is locally known (this/self/$this -> enclosing class; typed params; `T x`/`new T()`/`T::new()`/ `T{}`/`T.new` locals). Shared qualify_and_push() helper. - Class-scope prerequisites so the resolver can match an owning type: Rust impl_item/struct_item, Zig `const A = struct{…}`. (C/C++ landed earlier.) - Kotlin symbol extraction was entirely broken: the fieldless tree-sitter-kotlin grammar has no `name` field, so extract_function/extract_class/process_scope_node produced zero symbols. Added first_named_child_typed() fallback (simple_identifier / type_identifier) — Kotlin now indexes and resolves. Verified per language on controlled corpora: go() resolves a.run()/b.run() to the distinct run() of each class (previously collapsed onto the first same-named symbol). Added ScopeTypeResolution.* (7 langs). Full unit suite 1700/1700. Known base-case limits documented in the design doc: Ruby bare no-paren calls (parse as identifier, not call) aren't edges; Kotlin/Zig constructor calls show as a bare Call on the type. Unknown receivers degrade to the bare name, never a fabricated edge. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e env `A* a = new A();` is init_declarator > pointer_declarator > identifier, so the prior identifier-only unwrap never recorded `a:A` and `a->run()` stayed unqualified — the common C++ receiver shape. Peel init/pointer/reference/array declarators down to the identifier; the `*`/`&` live on the declarator, not the type token, so the recorded type stays "A". Added ScopeTypeResolution.CppQualifiesPointerAndValueReceivers. 1701/1701. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

andylbrummer and others added 6 commits June 17, 2026 07:24

andylbrummer mentioned this pull request Jun 18, 2026

perf(index): max_parse_file_size cap + stage profiler + real-project language corpora #11

Merged

andylbrummer merged commit 2101bde into fix-mcp-getcontext-name Jun 19, 2026

andylbrummer mentioned this pull request Jun 19, 2026

feat: type resolution (all 13 langs) + parse cap + real-project corpora [train] #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(types): scope-based receiver-type call resolution for all 13 languages (SCIP base case)#10

feat(types): scope-based receiver-type call resolution for all 13 languages (SCIP base case)#10
andylbrummer merged 6 commits into
fix-mcp-getcontext-namefrom
scip-type-resolution

andylbrummer commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

andylbrummer commented Jun 17, 2026

What

The key finding

How it works (write-path only; reads stay lock-free RCU)

Honest base-case limits (documented in the design doc)

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant