-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Summary
Deep research + codebase review to decide whether to disable/remove reranking for production, or keep/refine it for quality. Outcome should finalize reranking and embeddings strategy for launch.
Linked issue
- Switch embeddings to AI Gateway Qwen3-Embedding-4B; keep Together reranking #666 (Switch embeddings to AI Gateway Qwen3-Embedding-4B; keep Together reranking)
Goals
- Evaluate real-world quality impact vs cost/latency of reranking.
- Decide keep, optimize, or remove reranking.
- Lock final production settings and document the decision.
Scope of investigation
1) Codebase review (current reranking implementation)
src/lib/rag/reranker.ts(Together reranker + NoOp fallback)src/lib/rag/retriever.ts(rerank flow + telemetry + fallback)src/ai/tools/server/rag.tsand/api/rag/searchhandlers- Config + envs:
TOGETHER_AI_API_KEY, reranker config schema (@schemas/rag) - Test coverage:
src/lib/rag/__tests__/reranker.test.ts,src/lib/rag/__tests__/retriever.test.ts
2) Data + perf analysis
- Identify current rerank usage patterns and latency (telemetry spans/events).
- Measure impact on retrieval quality (offline eval or A/B) and end-user relevance.
- Cost analysis: Together rerank usage per request vs value.
3) Research
- Review AI SDK
rerank()guidance and current Together rerank model options. - Confirm any current best practices for batching, topN, or score thresholds.
Decision options
A) Remove reranking
- Pros: lower cost/latency; simpler system.
- Cons: potential drop in relevance/precision.
B) Keep reranking (as-is)
- Pros: quality boost where needed; minimal change.
- Cons: cost and latency persist.
C) Refine/optimize reranking
- Examples: reduce
topN, apply to specific queries only, threshold-based fallback, cache results, or model swap.
Expected deliverables
- Written decision with evidence and reasoning.
- Updated architecture/ADR note capturing final call.
- If changes are needed, a concrete implementation plan and checklist.
Acceptance criteria
- Decision documented with rationale and supporting data.
- Clear production configuration for embeddings + reranking for release.
- If reranking is modified or removed, updated tests + docs are identified.
Notes
This issue focuses on decision-making and deep review; execution should follow in a separate implementation issue or PR.
Reactions are currently unavailable