Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions docs/plans/2026-06-17-scope-type-resolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Scope-based type resolution (SCIP base case) — call-graph precision

## Problem
`resolve_reference_target` is name-string based (same-file → import/package
disambiguation → first candidate). No receiver-type resolution, so `x.M()` on a
common method name (ServeHTTP/String/Error/Close/Get…) can attribute the call to
the wrong same-named symbol. Qualified function calls (`pkg.Func()`) resolve
well; method calls on receivers do not.

## Approach (no type checker, no generics instantiation, no flow analysis)
Resolve the receiver's type from a **per-function local type environment** built
purely syntactically in the extractor, then emit method-call refs as
**receiver-type-qualified** names `Type.M`; the resolver matches `Type.M` to the
method symbol whose receiver type is `Type`. Unknown receiver type → bare `M`
(today's name-based path). Interface/dynamic dispatch → candidate set (honest;
same as gopls/SCIP).

## Components
1. **Local type env** (extractor, per function): `{name → type}` from
receiver, typed params, and simple decls. Cleared on function entry.
2. **Qualified emission**: `recv.M` where `typeof(recv)` known → ref
`referenced_name = "Type.M"`.
3. **Resolver**: dotted `Type.M` → candidates named `M` filtered by receiver
type (parsed from each candidate's signature) → exact; else fall back.

## Per-language base case (env population rules)
| Lang | receiver/self | local decls that yield a type |
|---|---|---|
| Go | `(r *T)` | `x := T{}`, `x := &T{}`, `var x T`, typed params, `x := NewT()`(ret type) |
| Java | `this`→class | `T x`, `new T()`, typed params |
| C# | `this`→class | `T x`, `var x = new T()`, typed params |
| TypeScript | `this`→class | `const x: T`, `x = new T()`, `(x: T)` |
| Python | `self`→class (scope) | `x = T(...)`, `x: T`, `def m(self, x: T)` |
| Rust | `&self`→impl type | `let x: T`, `let x = T::new()`, typed params |
| C++ | `this`→class | `T x;`, `T* x = new T()` |
| JS | `this`→class | `x = new T()` |
| Kotlin | `this`→class | `val x: T`, `x = T()` |
| PHP | `$this`→class | `$x = new T()` |
| Ruby | `self`→class | `x = T.new` |
| Zig | — | `var x: T`, `T{}` |

## LCI guideline fit
- Write-path only (extract/link); reads stay lock-free RCU.
- Env built once per function; resolution = hash lookups + cached.
- Deterministic; candidate sets sorted.
- Base case only; name-based fallback; per-language rules isolated like
`process_<lang>_reference`.
- Honest: unknown/dynamic → candidate set, never a fabricated single edge.

## Rollout (each phase: implement → measure precision on a real corpus → goldens)
1. Go (reference impl; chi/pocketbase) — proves architecture.
2. Java / C# / TypeScript (explicit types — cheapest).
3. Python / Rust (annotations + constructor inference; fastapi + a rust repo).
4. JS / C++ / Kotlin / PHP / Ruby / Zig.

## Status — all 13 languages have scope-typed call resolution
- [x] Go, JS/TS, Python, C/C++ (had call graphs; added receiver-type env + qualified emission + resolver scope match)
- [x] Java, C#, Rust, PHP, Kotlin, Ruby, Zig (had **no** call references at all — added
`process_<lang>_reference` Call extraction *and* the receiver-type env in the same pass)

### Prerequisite gaps fixed along the way
- C/C++: named `struct/class/union` specifiers (with a body) now open a Class scope so
member methods carry an owning-type entry the resolver matches.
- Rust: `impl_item`/`struct_item` open Class scopes (methods live in `impl`, `self` -> impl type).
- Zig: `const A = struct {…}` opens a Class scope named after the const.
- Kotlin: symbol extraction was entirely broken (fieldless grammar → `name` field lookups
returned null → zero symbols). Added a fieldless-name fallback (`first_named_child_typed`)
in `extract_function`/`extract_class`/`process_scope_node`.

### Known base-case limitations (honest; not fabricated)
- Ruby: a bare no-receiver, no-paren call (`help_a`) parses as `identifier`, not `call`, so it
is not emitted as a call edge. Receiver calls (`a.run`, `self.help_a`) and `T.new`-typed
locals resolve. Constructor `new` calls are intentionally not emitted as edges.
- Kotlin/Zig: `val a = A()` / `const a = A{}` constructor calls are emitted as a bare Call on the
type name (shows construction); harmless and resolves to the type symbol.
- All languages: unknown/dynamic receivers degrade to the bare method name (today's behavior),
never a fabricated single edge.

### Verification
- Controlled corpus per language: `go()` resolves `a.run()`/`b.run()` to the *distinct* `run`
method of each class (previously collapsed onto the first same-named symbol).
- Unit: `ScopeTypeResolution.*` (7 langs, extraction-level qualified-ref assertions) +
`ReferenceTrackerTest.ResolvesByReceiverTypeScope` (resolver-level). Full suite 1700/1700.
23 changes: 23 additions & 0 deletions include/lci/parser/unified_extractor.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#include <string_view>
#include <vector>

#include <absl/container/flat_hash_map.h>

#include <lci/reference.h>
#include <lci/scope.h>
#include <lci/side_effects.h>
Expand Down Expand Up @@ -214,6 +216,13 @@ class UnifiedExtractor {
void process_go_reference(TSNode node, std::string_view node_type);
void process_js_reference(TSNode node, std::string_view node_type);
void process_python_reference(TSNode node, std::string_view node_type);
void process_java_reference(TSNode node, std::string_view node_type);
void process_csharp_reference(TSNode node, std::string_view node_type);
void process_rust_reference(TSNode node, std::string_view node_type);
void process_php_reference(TSNode node, std::string_view node_type);
void process_kotlin_reference(TSNode node, std::string_view node_type);
void process_ruby_reference(TSNode node, std::string_view node_type);
void process_zig_reference(TSNode node, std::string_view node_type);
Reference create_reference(TSNode node, ReferenceType ref_type,
RefStrength strength);

Expand Down Expand Up @@ -272,6 +281,20 @@ class UnifiedExtractor {
uintptr_t id{};
};
std::vector<HandledEntry> handled_nodes_;

// Scope-based type resolution (SCIP base case): per-function map of local
// identifier -> type name, built syntactically (receiver, typed params,
// simple typed/constructor decls). Lets a method call `recv.M()` be emitted
// as a receiver-type-qualified ref `Type.M`, which the resolver matches to
// the method whose receiver type is `Type` — instead of a bare name that
// collides across same-named methods. Cleared on entering a top-level
// function/method; closures inherit the enclosing map. Write-path only.
absl::flat_hash_map<std::string, std::string> local_var_types_;
void seed_go_local_types(TSNode fn_node, bool is_method);
void record_go_local_var(TSNode decl_node);
// Nearest enclosing class/struct scope name (for self/this typing in
// class-based languages); empty if not inside one.
std::string enclosing_class_name() const;
};

/// Thread-local pool of UnifiedExtractor instances.
Expand Down
65 changes: 62 additions & 3 deletions src/core/reference_tracker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -751,14 +751,50 @@ uint64_t ReferenceTracker::fnv1a_hash_name(std::string_view name) {
return h;
}

namespace {
// Bare type name from a possibly-decorated receiver token: "*chi.Mux" -> "Mux".
std::string_view bare_type_name(std::string_view t) {
size_t i = 0;
while (i < t.size() && (t[i] == '*' || t[i] == '&')) ++i;
t = t.substr(i);
if (auto dot = t.rfind('.'); dot != std::string_view::npos)
t = t.substr(dot + 1);
return t;
}

// Go method-receiver type from a signature: "func (r *Mux) M(...)" -> "Mux".
std::string_view go_signature_receiver(std::string_view sig) {
constexpr std::string_view kFunc = "func (";
if (sig.rfind(kFunc, 0) != 0) return {};
auto close = sig.find(')', kFunc.size());
if (close == std::string_view::npos) return {};
std::string_view recv = sig.substr(kFunc.size(), close - kFunc.size());
if (auto sp = recv.rfind(' '); sp != std::string_view::npos)
recv = recv.substr(sp + 1); // drop the receiver var name
return bare_type_name(recv);
}

// Does this symbol's owning/receiver type equal `recv_type`? Matches Go
// receivers (parsed from the signature) and class-based languages (the
// enclosing class appears in scope_chain).
bool symbol_matches_receiver_type(const EnhancedSymbol& sym,
std::string_view recv_type) {
if (go_signature_receiver(sym.signature) == recv_type) return true;
for (const auto& sc : sym.scope_chain) {
if (bare_type_name(sc.name) == recv_type) return true;
}
return false;
}
} // namespace

SymbolID ReferenceTracker::resolve_reference_target(
const Snapshot& s, const Reference& ref,
std::span<const SymbolID> file_symbol_ids) {

const auto& name = ref.referenced_name;
if (name.empty()) return 0;
const auto& full_name = ref.referenced_name;
if (full_name.empty()) return 0;

uint64_t name_hash = fnv1a_hash_name(name);
uint64_t name_hash = fnv1a_hash_name(full_name);
uint64_t cache_key = (static_cast<uint64_t>(ref.file_id) << 32) |
(name_hash & 0xFFFFFFFF);

Expand All @@ -767,6 +803,29 @@ SymbolID ReferenceTracker::resolve_reference_target(
return it->second;
}

// Scope-typed method ref "Type.M" (emitted by the extractor when the
// receiver's type is locally known): resolve to the method named M whose
// receiver/owning type is Type — the precise target among same-named
// methods. Bare lookup name is M; on no receiver-type match we fall through
// to the name-based path on M (so unknown/dynamic receivers degrade to the
// existing behavior rather than failing).
std::string_view name = full_name;
std::string_view recv_type;
if (auto dot = full_name.rfind('.'); dot != std::string::npos) {
recv_type = std::string_view(full_name).substr(0, dot);
name = std::string_view(full_name).substr(dot + 1);
if (!recv_type.empty() && !name.empty()) {
for (SymbolID id : s.symbols.get_symbols_by_name(name)) {
if (const auto* sym = s.symbols.get(id)) {
if (symbol_matches_receiver_type(*sym, recv_type)) {
reference_cache_[cache_key] = id;
return id;
}
}
}
}
}

// Check same-file symbols first (fast path).
for (SymbolID id : file_symbol_ids) {
if (const auto* sym = s.symbols.get(id)) {
Expand Down
Loading