feat: opt-in query rewriting for multi-turn RAG conversations by Alminc91 · Pull Request #5188 · Mintplex-Labs/anything-llm

Alminc91 · 2026-03-10T22:40:05Z

Problem

In multi-turn conversations, follow-up queries like "Yes, the B1 course please" or "Tell me more about that" fail at the RAG retrieval stage. Vector search on the literal text "Yes, B1 please" returns 0 relevant results. Reranking cannot fix this — it reorders results after retrieval, and reordering 0 relevant results still yields 0 relevant results.

This is the single biggest quality gap in multi-turn RAG, and the reason every major RAG framework has adopted query rewriting as a pre-retrieval step.

Solution

A 75-line module (queryRewriter.js) that rewrites ambiguous follow-up queries into standalone search queries before vector search. Integrates via a 2-line import+call in each chat handler.

Opt-in per workspace. Disabled by default. Enable via a toggle in Chat Settings → "Query Rewriting". When disabled, zero code paths change.

How it works

Gate: Only runs when chat history exists, query is ≤12 words, and the workspace has query rewriting enabled
Rewrite: Sends a short prompt (~550 tokens) with the last 2 turn-pairs to the workspace LLM
Use: Rewritten query goes to performSimilaritySearch. Original message is preserved for chat completion and history
Fallback: On any error, the original query is used. Zero risk of degrading existing behavior

Benchmark Results

Tested on Mistral Small 24B (FP8), a small locally-hosted model, across 250 queries (6 prompt variants):

Prompt Variant	Follow-ups (25)	Self-contained (25)	Hallucinations	Overall
Final version	25/25 (100%)	21/25 (84%)	0	92%
UNCHANGED signal	17/25 (68%)	25/25 (100%)	0	84%
Reference extraction	23/25 (92%)	18/25 (72%)	0	82%
LangChain prompt	~20/25 (80%)	3/25 (12%)	5+	~46%

The 4 self-contained "errors" in the final version are harmless paraphrases (same search results). Zero hallucinations, zero meta-text leakage across all 50 test queries.

Latency: ~260ms self-contained, ~300ms rewrites. Tested on a small on-device model.

Safety Features

Feature	Behavior
Opt-in per workspace	Disabled by default. Enable in Chat Settings
Env override	`ENABLE_QUERY_REWRITING=true` sets default for all workspaces
Word count threshold	Queries above 12 words skip rewriting (configurable via `QUERY_REWRITE_WORD_THRESHOLD`)
Verbatim detection	If the LLM returns the query unchanged, original is used
Error fallback	Any exception returns the original query — zero disruption
History gate	First message always skips (no history to reference)

Worst case: Original query used unchanged. Cannot produce worse results than current behavior.

Changes

File	Change
`server/utils/helpers/chat/queryRewriter.js`	New — rewrite logic + prompt (~75 lines)
`server/utils/chats/embed.js`	Import + call before vector search (+2 lines)
`server/utils/chats/stream.js`	Import + call before vector search (+2 lines)
`server/utils/chats/apiChatHandler.js`	Import + call before vector search (+2 lines, 2 handlers)
`server/utils/chats/openaiCompatible.js`	Import + call before vector search (+2 lines, 2 handlers)
`server/models/workspace.js`	Add `queryRewriteMode` to writable fields + validation
`server/prisma/schema.prisma`	Add `queryRewriteMode` column (nullable, default "off")
`frontend/.../ChatSettings/QueryRewriteMode/index.jsx`	New — UI toggle component (~45 lines)
`frontend/.../ChatSettings/index.jsx`	Import + render toggle

No new dependencies. No breaking changes.

Industry Precedent

This is not experimental — it is the standard approach for multi-turn RAG:

LangChain — create_history_aware_retriever: LLM-based query contextualization
Open WebUI — Enabled by default; generates search queries from history
Vercel AI SDK — generateObject for query transformation before RAG
Amazon Bedrock Knowledge Bases — Built-in query reformulation
Google Vertex AI RAG — Context-aware query rewriting

Related Issues

[FEAT]: Contextual RAG & Document Structuring Improvements 🚀 #4352 (Contextual RAG)
[FEAT]: RAG retrieval process misses reference to chat history #4071 (RAG retrieval misses chat history reference)
RAG retrieval process chat history (previous issue) #4072 (follow-up)

🤖 Generated with Claude Code

…ersations Before vector search, rewrite short follow-up queries using chat history so the search query captures the full conversational intent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… validation Instead of always rewriting (like LangChain/LlamaIndex), the LLM now responds with UNCHANGED for self-contained queries — reducing latency by ~40% (1 token vs reproducing the full query). Output validation ensures the function only ever returns the original query or a valid rewrite, never meta-text like "no rewrite needed": - Layer 1: Explicit UNCHANGED signal from LLM (fast path) - Layer 2: Verbatim copy detection (fallback for less capable models) - Layer 3: Content word overlap check — a valid rewrite must share topic words with the conversation context. Meta-responses do not. This works across all languages and models without keyword lists. Tested with Mistral Small 24B: - UNCHANGED queries: ~125-240ms (was ~240-435ms without signal) - Rewritten queries: ~275-290ms - All test cases pass: follow-ups rewritten, self-contained queries unchanged Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Layer 3 word-level matching fails for languages without spaces between words (Chinese, Japanese, Korean, Thai). These scripts use meaningful individual characters, so we fall back to character-level overlap checking for any non-ASCII character (charCode > 127). This makes the output validation truly language-agnostic: - Space-separated languages (Latin, Arabic, Cyrillic, Hebrew): word-level - Non-space languages (CJK, Thai, Lao, Myanmar): character-level fallback - Meta-responses in any language: still correctly rejected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace UNCHANGED signal + 3-layer validation with simpler "return EXACTLY as written" prompt and verbatim check. Tested across 6 prompt variants on Mistral Small 24B FP8: 92% accuracy, 0 hallucinations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add per-workspace queryRewriteMode setting (default: off) - Add UI toggle in Chat Settings for easy enable/disable - Replace UNCHANGED signal with simpler "return EXACTLY as written" prompt based on 250-query benchmark (92% accuracy, 0 hallucinations) - Env var ENABLE_QUERY_REWRITING=true sets default for all workspaces Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Alminc91 and others added 5 commits March 8, 2026 22:28

feat: LLM-based query rewriting for contextual RAG in multi-turn conv…

21a3059

…ersations Before vector search, rewrite short follow-up queries using chat history so the search query captures the full conversational intent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Alminc91 closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: opt-in query rewriting for multi-turn RAG conversations#5188

feat: opt-in query rewriting for multi-turn RAG conversations#5188
Alminc91 wants to merge 5 commits intoMintplex-Labs:masterfrom
Alminc91:feat/query-rewriting-for-contextual-rag

Alminc91 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Alminc91 commented Mar 10, 2026

Problem

Solution

How it works

Benchmark Results

Safety Features

Changes

Industry Precedent

Related Issues

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant