Skip to content

feat(types): scope-based receiver-type call resolution for all 13 languages (SCIP base case)#10

Merged
andylbrummer merged 6 commits into
fix-mcp-getcontext-namefrom
scip-type-resolution
Jun 19, 2026
Merged

feat(types): scope-based receiver-type call resolution for all 13 languages (SCIP base case)#10
andylbrummer merged 6 commits into
fix-mcp-getcontext-namefrom
scip-type-resolution

Conversation

@andylbrummer

Copy link
Copy Markdown
Member

What

Call-graph method calls now resolve by the receiver's type, not just the method-name string. Same-named methods across classes (run/ServeHTTP/Close/String…) attribute to the correct symbol instead of collapsing onto the first match. SCIP base case — no type checker, no generics instantiation, no flow analysis.

Covers all 13 languages. Stacked on #9 (base branch fix-mcp-getcontext-name); retarget to main once #9 merges. The 5 type-resolution commits:

  • Go, Python, JS/TS (phases 1–3) — languages that already had a call graph.
  • C/C++ (af9f738) — added Class scope for named struct/class/union, this/typed-local env, qualified emission.
  • The remaining 7 (635b746) — Java, C#, Rust, PHP, Kotlin, Ruby, Zig.

The key finding

7 of 13 languages (Java, C#, Rust, PHP, Kotlin, Ruby, Zig) emitted zero call references — they had no call graph at all, so there was nothing to type-resolve. This PR builds the call graph and the receiver-type env for each, in the single extraction pass. Kotlin was worse: its fieldless tree-sitter grammar broke symbol extraction entirely (zero symbols) — fixed with a fieldless-name fallback.

How it works (write-path only; reads stay lock-free RCU)

  1. Local type env (local_var_types_, per function): this/self/$this → enclosing class; typed params; T x / new T() / T::new() / T{} / T.new locals.
  2. Qualified emission: a method call recv.m() whose receiver type is known emits a ref named Type.m (tagging the method-name node, not the un-resolvable recv.m selector).
  3. Resolver (resolve_reference_target): splits the dotted name, picks the candidate whose owning type matches via symbol_matches_receiver_type — Go parses the receiver from the signature; class-based languages match the owning type in scope_chain. Unknown/dynamic receiver → bare-name fallback. Never a fabricated edge.

Per-language logic isolated in process_<lang>_reference. Class-scope prerequisites added where missing so the resolver can match an owning type: C/C++ specifiers, Rust impl_item/struct_item, Zig const A = struct{}.

Honest base-case limits (documented in the design doc)

  • Ruby: a bare no-receiver, no-paren call (help_a) parses as identifier, not call, so it is not emitted as an edge. Receiver calls (a.run, self.help_a) and T.new-typed locals resolve.
  • Kotlin/Zig: val a = A() / const a = A{} constructor calls emit as a bare Call on the type name (shows construction; resolves to the type).

Verification

  • Controlled corpus per language: go() resolves a.run()/b.run() to the distinct run of each class (previously both collapsed onto the first same-named symbol).
  • Unit: ScopeTypeResolution.* (7 langs, extraction-level qualified-ref assertions) + ReferenceTrackerTest.ResolvesByReceiverTypeScope (resolver-level). Full suite 1700/1700.
  • Integration goldens: 128/128 — no drift from the broad scope/symbol changes.

🤖 Generated with Claude Code

andylbrummer and others added 6 commits June 17, 2026 07:24
Resolve method-call targets by the receiver's locally-known TYPE instead of
name-string only, so x.M() on a common method name resolves to the exact method
(the SCIP base case), not an arbitrary same-named symbol.

Mechanism (write-path only; reads stay lock-free RCU):
- Per-function local type env {name -> type}, built syntactically in the Go
  extractor from the receiver, typed params, `var x T`, and `x := T{}/&T{}`.
  Cleared per function; closures inherit the enclosing env.
- A method call `recv.M()` whose receiver type is known is emitted as a
  receiver-type-qualified ref `Type.M`.
- resolve_reference_target: a dotted `Type.M` resolves to the method named M
  whose receiver/owning type is Type (Go receiver parsed from signature; class
  langs matched via scope_chain — ready for later phases). Unknown/dynamic
  receiver falls back to the existing name-based path (degrades to a candidate,
  honest — same as gopls/SCIP for interface dispatch).

Verified:
- Controlled 2-type corpus: A.Do->a.helpA resolves to A.helpA, B.Do->b.helpB to
  B.helpB (no same-name collision); callers correct.
- chi: param-typed receivers resolve (r *http.Request -> r.Context() ->
  Request.Context); field-access chains (mx.pool.Get) degrade to bare name.
- Full unit 1692/1692; integration 128/128; no golden regen (synthetic golden
  corpus has no typed method calls).

Design + per-language rollout: docs/plans/2026-06-17-scope-type-resolution.md.
Phases 2-4 (Java/C#/TS, Python/Rust, JS/C++/Kotlin/PHP/Ruby/Zig) follow this
template per language.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extend the SCIP-base-case type resolution to Python, and fix the same
method-call caller gap Go had: process_python_reference tagged a method call's
name as the un-resolvable "obj.M" Call + a Usage on "M", so Python methods had
no callers. Now tag the attribute (method name) as the Call, qualified to
"Type.M" when the receiver type is known.

Local type env (UNAMBIGUOUS sources only — `x = Foo()` is skipped because
constructor vs factory call is syntactically identical in Python):
- self / cls -> enclosing class (via enclosing_class_name() over the scope
  stack; resolver matches the class through scope_chain).
- annotated params `def m(self, x: T)` and annotated assignments `x: T`.
py_bare_type strips quotes (string annotations), subscripts (List[Foo]->List),
and module qualifiers.

Verified:
- Controlled 2-class corpus: A.do/self.helpA() -> A.helpA, B.do -> B.helpB (no
  same-name collision).
- Full unit 1692/1692; MCP goldens 14/14 clean (earlier batch failures were the
  pre-existing MCP-readiness flake under load — 73ms pre-index responses — and
  reproduce on the Go phase too; get_context passes at 5082ms unloaded).

Reused for all class-based languages: enclosing_class_name() + scope_chain
receiver matching. Next: TS/JS, Java, C#, Rust, C++, Kotlin, PHP, Ruby, Zig.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extend SCIP-base-case type resolution to JavaScript/TypeScript and fix the same
method-call caller gap: a call obj.M() tagged the un-resolvable "obj.M"
member_expression as the Call (method name M only got a Usage), so JS/TS methods
had no callers. Now tag the PROPERTY (M) as the Call, qualified to "Type.M" when
the receiver type is known.

Local type env:
- this -> enclosing class (enclosing_class_name()).
- TS-annotated params `(x: T)` and variable annotations `const x: T`.
- `new T()` constructor inference (`const x = new T()` -> x: T) — unambiguous in
  JS/TS unlike Python.
js_bare_type strips ": ", generics (Foo<Bar>->Foo), array suffix, qualifier.

Verified:
- Controlled TS 2-class corpus: A.do/this.helpA()->A.helpA, B.do->B.helpB;
  run(): const a:A=new A(); a.do()->A.do and const b=new B(); b.do()->B.do
  (annotation + new() inference both resolve).
- Full unit 1692/1692; trpc TS real-project 4/4.
  (MCP batch goldens flake on the pre-existing MCP-readiness race under load —
  73ms pre-index responses; pass at ~5s when given time; multi-lang corpus has
  no JS/TS file so this change cannot affect them.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e case)

Phase 4 of scope-type resolution. C/C++ method calls now resolve by the
receiver's TYPE, not just the method name, so same-named methods across
different classes (run/ServeHTTP/Close/…) attribute to the correct symbol.

- process_scope_node: a named struct_specifier/class_specifier/union_specifier
  *with a body* opens a Class scope named after the aggregate. This gives
  member methods an owning-class entry in their scope_chain, which the resolver
  matches against a scope-typed `T.m` ref. Bodyless forms (forward decls,
  `struct A a;` uses) are excluded so they don't nest the surrounding scope.
- process_reference_node (cpp branch): builds a per-function local var->type env
  (this -> enclosing class; `T x;` / `T x = ...` decls) and emits field-call
  refs as receiver-type-qualified `Type.m` when the receiver type is known.
  Unknown receivers fall back to the bare name (today's behavior).
- Relocated go/js/py bare-type helpers to the top anon namespace so the cpp
  branch can use go_bare_type (was defined after the use site).

Verified on a controlled corpus: go() -> {A.run -> A.helpA, B.run -> B.helpB}
resolves both edges distinctly (previously both collapsed onto A.run).
Added ReferenceTrackerTest.ResolvesByReceiverTypeScope. Full unit suite
1693/1693 green; no regressions from the new C/C++ class scopes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e remaining 7 languages

Java, C#, Rust, PHP, Kotlin, Ruby, and Zig previously emitted ZERO call
references — they had no call graph at all, so there was nothing to
type-resolve. This adds, in the single extraction pass, both the call-reference
extraction and the SCIP-base-case receiver-type env for each:

- process_<lang>_reference: emit method/function calls as ReferenceType::Call,
  tagging the method-name node (not the un-resolvable receiver.method selector),
  and qualify to "Type.method" when the receiver's type is locally known
  (this/self/$this -> enclosing class; typed params; `T x`/`new T()`/`T::new()`/
  `T{}`/`T.new` locals). Shared qualify_and_push() helper.
- Class-scope prerequisites so the resolver can match an owning type:
  Rust impl_item/struct_item, Zig `const A = struct{…}`. (C/C++ landed earlier.)
- Kotlin symbol extraction was entirely broken: the fieldless tree-sitter-kotlin
  grammar has no `name` field, so extract_function/extract_class/process_scope_node
  produced zero symbols. Added first_named_child_typed() fallback
  (simple_identifier / type_identifier) — Kotlin now indexes and resolves.

Verified per language on controlled corpora: go() resolves a.run()/b.run() to the
distinct run() of each class (previously collapsed onto the first same-named
symbol). Added ScopeTypeResolution.* (7 langs). Full unit suite 1700/1700.

Known base-case limits documented in the design doc: Ruby bare no-paren calls
(parse as identifier, not call) aren't edges; Kotlin/Zig constructor calls show
as a bare Call on the type. Unknown receivers degrade to the bare name, never a
fabricated edge.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e env

`A* a = new A();` is init_declarator > pointer_declarator > identifier, so the
prior identifier-only unwrap never recorded `a:A` and `a->run()` stayed
unqualified — the common C++ receiver shape. Peel init/pointer/reference/array
declarators down to the identifier; the `*`/`&` live on the declarator, not the
type token, so the recorded type stays "A".

Added ScopeTypeResolution.CppQualifiesPointerAndValueReceivers. 1701/1701.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant