Evaluate reranking: keep, optimize, or remove for production

## Summary
Deep research + codebase review to decide whether to disable/remove reranking for production, or keep/refine it for quality. Outcome should finalize reranking and embeddings strategy for launch.

## Linked issue
- #666 (Switch embeddings to AI Gateway Qwen3-Embedding-4B; keep Together reranking)

## Goals
- Evaluate real-world quality impact vs cost/latency of reranking.
- Decide **keep**, **optimize**, or **remove** reranking.
- Lock final production settings and document the decision.

## Scope of investigation
### 1) Codebase review (current reranking implementation)
- `src/lib/rag/reranker.ts` (Together reranker + NoOp fallback)
- `src/lib/rag/retriever.ts` (rerank flow + telemetry + fallback)
- `src/ai/tools/server/rag.ts` and `/api/rag/search` handlers
- Config + envs: `TOGETHER_AI_API_KEY`, reranker config schema (`@schemas/rag`)
- Test coverage: `src/lib/rag/__tests__/reranker.test.ts`, `src/lib/rag/__tests__/retriever.test.ts`

### 2) Data + perf analysis
- Identify current rerank usage patterns and latency (telemetry spans/events).
- Measure impact on retrieval quality (offline eval or A/B) and end-user relevance.
- Cost analysis: Together rerank usage per request vs value.

### 3) Research
- Review AI SDK `rerank()` guidance and current Together rerank model options.
- Confirm any current best practices for batching, topN, or score thresholds.

## Decision options
A) **Remove reranking**
- Pros: lower cost/latency; simpler system.
- Cons: potential drop in relevance/precision.

B) **Keep reranking (as-is)**
- Pros: quality boost where needed; minimal change.
- Cons: cost and latency persist.

C) **Refine/optimize reranking**
- Examples: reduce `topN`, apply to specific queries only, threshold-based fallback, cache results, or model swap.

## Expected deliverables
- Written decision with evidence and reasoning.
- Updated architecture/ADR note capturing final call.
- If changes are needed, a concrete implementation plan and checklist.

## Acceptance criteria
- Decision documented with rationale and supporting data.
- Clear production configuration for embeddings + reranking for release.
- If reranking is modified or removed, updated tests + docs are identified.

## Notes
This issue focuses on **decision-making and deep review**; execution should follow in a separate implementation issue or PR.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate reranking: keep, optimize, or remove for production #667

Summary

Linked issue

Goals

Scope of investigation

1) Codebase review (current reranking implementation)

2) Data + perf analysis

3) Research

Decision options

Expected deliverables

Acceptance criteria

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluate reranking: keep, optimize, or remove for production #667

Description

Summary

Linked issue

Goals

Scope of investigation

1) Codebase review (current reranking implementation)

2) Data + perf analysis

3) Research

Decision options

Expected deliverables

Acceptance criteria

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions