Skip to content

recall: Graph expansion follows too many hops through generic/hub nodes #74

@jack-arturo

Description

@jack-arturo

Problem

Graph expansion during recall follows too many hops through generic entity nodes, pulling in memories that are connected only through shared infrastructure or common contacts — not through meaningful semantic relationships.

Example Traversal Path

Query: "Alex Panagis" with expand_entities=true

Alex Panagis (memory) 
  → entity:organizations:automem 
    → Alex Beck memories (also uses AutoMem)
      → entity:people:jack 
        → Zack Katz, Luka, Mastermind crew, every Jack-related memory

The expansion followed: person → tool → different person → shared contact → everyone. By hop 3, results have zero relevance to the original query.

Current Mitigation

The expand_min_importance and expand_min_strength params help somewhat (cut results from 40 → 16), but they filter on node properties, not on path relevance. A high-importance memory about Alex Beck is still irrelevant to an Alex Panagis query, regardless of its importance score.

Proposed Solutions

  1. Hop-depth limiting by query type: For person/entity lookups, cap expansion at 1 hop. For topic/concept queries, allow 2-3 hops. The system could infer query type from the presence of proper nouns vs. general terms.

  2. Path relevance decay: Apply a decay multiplier at each hop. If the seed result scores 0.72, the first hop should require at least ~0.5 relevance to the original query (not just to the intermediate node), second hop ~0.35, etc.

  3. Hub node detection: Identify high-connectivity "hub" nodes (like entity:organizations:automem, entity:people:jack) that connect to many unrelated memories, and deprioritize expansion through them. These nodes are structurally important but semantically promiscuous.

  4. Entity-type-aware expansion: When the query is about a person, only expand through entity:people:* nodes, not through entity:organizations:* or entity:tools:*. This prevents the tool/org bridge problem.

  5. Configurable max_hops parameter: Let callers explicitly set maximum graph traversal depth (default: 2, person queries: 1).

Related Issues

Impact

Uncontrolled expansion is the primary reason recall returns 40 results when 5 would suffice. Each irrelevant result costs context tokens and degrades the LLM's ability to synthesize a useful response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions