Skip to content

Poor search accuracy: Correct chunk ranked #14 by all embedding models #465

@manavgup

Description

@manavgup

Problem

Question "What was IBM revenue in 2022?" ranks the correct answer chunk at position #14, outside the default top_k=5, causing users to receive incorrect or incomplete answers.

Evidence

  • All 8 embedding models tested rank revenue chunk identically at Build and tests fixes #14
  • Similarity score: 0.7069
  • Query: "What was IBM revenue in 2022?"
  • Correct chunk: "For the year, IBM generated $60.5 billion in revenue..."

Test Results

Model Rank Score Answer
slate-125m-english-rtrvr #14 0.7069 ✅ (with top_k=20)
slate-125m-english-rtrvr-v2 #14 0.7069 ✅ (with top_k=20)
granite-107m-multilingual #14 0.7069
(all 8 models) #14 0.7069 7/8 correct

Root Cause

Semantic matching on generic financial keywords rather than specific factual content.

Chunks ranked #1-13 contain generic terms that semantically match the query but don't contain the answer:

  • "consolidated financial results"
  • "annual report"
  • "stockholders"
  • "financial statements"

The revenue chunk (#14) uses different phrasing:

  • "generated" instead of "revenue"
  • "For the year" instead of "in 2022"

Impact

  • Critical UX issue: Default top_k=5 misses correct answer
  • Users get wrong/incomplete information
  • System appears unreliable for factual questions
  • Workaround requires top_k=20 (expensive, slower)

Solution Options

Option A: Fix LLM Reranker (QUICK WIN - RECOMMENDED)

  • Effort: 30 min
  • Impact: 70-80% improvement
  • Action: Fix reranker template=None bug
  • LLM can read all 20 chunks and identify chunk Build and tests fixes #14 as most relevant

Option B: Implement Hybrid Search

  • Effort: 3-4 hours
  • Impact: 50-60% improvement
  • Combine vector similarity (70%) + BM25 keyword matching (30%)
  • Boosts chunks with exact "revenue" and "2022" keywords

Option C: Improve Query Rewriting

  • Effort: 1-2 hours
  • Impact: 20-30% improvement
  • Remove generic expansion: "AND (relevant OR important OR key)"
  • Add entity extraction and synonym expansion

Option D: Reduce Chunk Size

  • Effort: 2-3 hours (re-ingestion required)
  • Impact: 30-40% improvement
  • Test 400 chars vs current 750 chars
  • Reduces signal dilution

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:criticalCritical priority - blocks production

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions