Poor search accuracy: Correct chunk ranked #14 by all embedding models

## Problem
Question **"What was IBM revenue in 2022?"** ranks the correct answer chunk at position **#14**, outside the default `top_k=5`, causing users to receive incorrect or incomplete answers.

## Evidence
- **All 8 embedding models tested** rank revenue chunk identically at #14
- **Similarity score**: 0.7069
- **Query**: "What was IBM revenue in 2022?"
- **Correct chunk**: "For the year, IBM generated $60.5 billion in revenue..."

### Test Results
| Model | Rank | Score | Answer |
|-------|------|-------|--------|
| slate-125m-english-rtrvr | #14 | 0.7069 | ✅ (with top_k=20) |
| slate-125m-english-rtrvr-v2 | #14 | 0.7069 | ✅ (with top_k=20) |
| granite-107m-multilingual | #14 | 0.7069 | ❌ |
| (all 8 models) | #14 | 0.7069 | 7/8 correct |

## Root Cause
**Semantic matching on generic financial keywords rather than specific factual content.**

Chunks ranked #1-13 contain generic terms that semantically match the query but don't contain the answer:
- "consolidated financial results"
- "annual report"
- "stockholders"
- "financial statements"

The revenue chunk (#14) uses different phrasing:
- "generated" instead of "revenue"
- "For the year" instead of "in 2022"

## Impact
- **Critical UX issue**: Default `top_k=5` misses correct answer
- Users get wrong/incomplete information
- System appears unreliable for factual questions
- Workaround requires `top_k=20` (expensive, slower)

## Solution Options

### Option A: Fix LLM Reranker (QUICK WIN - RECOMMENDED)
- **Effort**: 30 min
- **Impact**: 70-80% improvement
- **Action**: Fix reranker template=None bug
- LLM can read all 20 chunks and identify chunk #14 as most relevant

### Option B: Implement Hybrid Search
- **Effort**: 3-4 hours
- **Impact**: 50-60% improvement
- Combine vector similarity (70%) + BM25 keyword matching (30%)
- Boosts chunks with exact "revenue" and "2022" keywords

### Option C: Improve Query Rewriting
- **Effort**: 1-2 hours
- **Impact**: 20-30% improvement
- Remove generic expansion: "AND (relevant OR important OR key)"
- Add entity extraction and synonym expansion

### Option D: Reduce Chunk Size
- **Effort**: 2-3 hours (re-ingestion required)
- **Impact**: 30-40% improvement
- Test 400 chars vs current 750 chars
- Reduces signal dilution

## Related
- Reranker template validation bug (BLOCKS this fix)
- Issue #461: CoT reasoning leak (separate, affects response quality)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor search accuracy: Correct chunk ranked #14 by all embedding models #465

Problem

Evidence

Test Results

Root Cause

Impact

Solution Options

Option A: Fix LLM Reranker (QUICK WIN - RECOMMENDED)

Option B: Implement Hybrid Search

Option C: Improve Query Rewriting

Option D: Reduce Chunk Size

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model	Rank	Score	Answer
slate-125m-english-rtrvr	#14	0.7069	✅ (with top_k=20)
slate-125m-english-rtrvr-v2	#14	0.7069	✅ (with top_k=20)
granite-107m-multilingual	#14	0.7069	❌
(all 8 models)	#14	0.7069	7/8 correct

Poor search accuracy: Correct chunk ranked #14 by all embedding models #465

Description

Problem

Evidence

Test Results

Root Cause

Impact

Solution Options

Option A: Fix LLM Reranker (QUICK WIN - RECOMMENDED)

Option B: Implement Hybrid Search

Option C: Improve Query Rewriting

Option D: Reduce Chunk Size

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions