feat: Add support for query rewrite in vector_store.search #4171

franciscojavierarceo · 2025-11-17T04:57:19Z

What does this PR do?

Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig.

Makes rewrite_query parameter functional in vector store search.

rewrite_query=false (default): Use original query
rewrite_query=true: Expand query via LLM, or fail gracefully if no LLM available

Adds 4 parameters toVectorStoresConfig:

default_query_expansion_model: LLM model for query expansion (optional)
query_expansion_prompt: Custom prompt template (optional, uses built-in default)
query_expansion_max_tokens: Configurable token limit (default: 100)
query_expansion_temperature: Configurable temperature (default: 0.3)

Enabled run.yaml:

  vector_stores:
    rewrite_query_params:
      model:
        provider_id: "meta-reference"
        model_id: "llama3_1_8b_instruct"
      # prompt defaults to built-in
      # max_tokens defaults to 100
      # temperature defaults to 0.3

Fully customized run.yaml:

  vector_stores:
    rewrite_query_params:
      model:
        provider_id: "meta-reference"
        model_id: "llama3_1_8b_instruct"
      prompt: "Improve this search query: {query}"
      max_tokens: 150
      temperature: 0.2

Test Plan

Added test and recording

Example script as well:

import asyncio
from llama_stack_client import LlamaStackClient
from io import BytesIO

def gen_file(client, text: str=""):
    file_buffer = BytesIO(text.encode('utf-8'))
    file_buffer.name = "my_file.txt"

    uploaded_file = client.files.create(
        file=file_buffer,
        purpose="assistants"
    )
    return uploaded_file

async def test_query_rewriting():
    client = LlamaStackClient(base_url="http://0.0.0.0:8321/")
    uploaded_file = gen_file(client, "banana banana apple")
    uploaded_file2 = gen_file(client, "orange orange kiwi")

    vs = client.vector_stores.create()
    xf_vs = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)
    xf_vs1 = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file2.id)
    response1 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="apple",
                max_num_results=3,
                rewrite_query=False
            )
    response2 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="kiwi",
                max_num_results=3,
                rewrite_query=True,
            )

    print(f"\n🔵 Response 1 (rewrite_query=False):\n\033[94m{response1}\033[0m")
    print(f"\n🟢 Response 2 (rewrite_query=True):\n\033[92m{response2}\033[0m")

    for f in [uploaded_file.id, uploaded_file2.id]:
        client.files.delete(file_id=f)
    client.vector_stores.delete(vector_store_id=vs.id)

if __name__ == "__main__":
    asyncio.run(test_query_rewriting())

And see the screen shot of the server logs showing it worked.

Notice the log:

 Query rewritten:
         'kiwi' → 'kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'

So kiwi was expanded.

franciscojavierarceo · 2025-11-18T01:53:57Z

src/llama_stack/providers/utils/memory/vector_store.py

+        llm_models = [m for m in models_response.data if m.model_type == ModelType.llm]
+
+        # Filter out models that are known to be embedding models (misclassified as LLM)
+        embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"]


removing this and provider_priority below

src/llama_stack/providers/utils/memory/vector_store.py

mattf · 2025-11-18T13:48:51Z

@franciscojavierarceo fyi, the example has apple as the first query, the log shows kiwi twice

mattf

what about having the vector store config specify the rewrite model and the request is rejected if none is configured?

this would make the behavior somewhat stable.

the config would be per vector store. the rewrite prompt could be a config option as well. maybe you go as far as to include completion params like temperature.

   ...
   query_rewriter:
      model: ollama/llama6-magic
      prompt: "do your thing on {query} and be magical"
   ...

raghotham · 2025-11-18T17:05:15Z

src/llama_stack/providers/utils/memory/vector_store.py

+        llm_models = [m for m in models_response.data if m.model_type == ModelType.llm]
+
+        # Filter out models that are known to be embedding models (misclassified as LLM)
+        embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"]


instead of hardcoding models and providers, cant you optionally just take in a "query_rewrite_model" when creating the vector store? Also, can we use "metadata" attribute to pass in parameters that are not supported by openai?

Yeah that's what I'm adding, similar to what @mattf suggested too. I got that working last night but ended up going to bed before pushing it.

franciscojavierarceo · 2025-11-18T17:44:49Z

what about having the vector store config specify the rewrite model and the request is rejected if none is configured?

Yeah, that's actually what I ended up adding, sorry requested reviews a bit premature I'll push that update soon.

mattf · 2025-11-19T13:39:05Z

@franciscojavierarceo please update the description with the new proposed config and user interaction

mergify · 2025-11-19T15:25:05Z

This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

franciscojavierarceo · 2025-11-20T15:12:33Z

the spec doesn't mention letting api callers override the vector store config, but the implementation appears to let them. is this intended?

@mattf this was added at the request of @raghotham in discord. I can update the PR description to reflect this for sure. I actually would have preferred to do so as a follow up to reduce scope.

the model validation logic should be left to the inference api. if the model is of the wrong type the inference api should handle the error generation.

Sounds good, will revise.

vector store model type validation should happen during startup, if at all. right now the admin sees a successful startup and the user sees 400 errors about a misconfiguration.

Will revise. 👍

mattf · 2025-11-20T15:49:49Z

the spec doesn't mention letting api callers override the vector store config, but the implementation appears to let them. is this intended?

@mattf this was added at the request of @raghotham in discord. I can update the PR description to reflect this for sure. I actually would have preferred to do so as a follow up to reduce scope.

@raghotham imho we have two use cases here:

(a) production - the vector store config gets set at creation time, which should include locking in the embedding model + rewrite model + rewrite prompt. having the rewrite model & prompt flexibility allows for tailoring vector stores to an app (real value here).
(b) research & development - the best config for the vector store is not known / tuned yet. in this case, it's wasteful to recreate (re-embed) the vector store for each iteration to tune the rewrite model & prompt. therefore, passing a new model & prompt for each query is important.

if we need both (a) and (b) in this pr, i suggest not taking extra args on the query call, but using the existing vector store update path.

anik120 · 2025-11-20T21:50:43Z

Ack @mattf - +1 with the startup validation approach.

@franciscojavierarceo FYI I created #4184 to tackle startup validation for optional dependencies so admins catch
misconfigurations at deploy time rather than users hitting 400 errors and needing to debug after. Looks like the validation changes here would align with that pattern.

On the production vs. R&D discussion: Matt's suggestion to use the "update vector store" path seems like a good middle
ground - gives R&D the flexibility to iterate without per-query instability. Happy to discuss additional validation patterns in
#4184 if it helps.

franciscojavierarceo · 2025-11-21T20:07:32Z

src/llama_stack/core/stack.py

        await validate_vector_stores_config(self.run_config.vector_stores, impls)
        await validate_safety_config(self.run_config.safety, impls)
+
+        # Set global query expansion configuration from stack config


importing here to avoid importing numpy

raghotham · 2025-11-21T20:17:58Z

src/llama_stack/core/datatypes.py

        default=None,
        description="Default embedding model configuration for vector stores.",
    )
+    default_query_expansion_model: QualifiedModel | None = Field(


maybe just stick to one of expansion or rewrite across the board? its a little confusing if we have expansion here but rewrite elsewhere

raghotham · 2025-11-21T20:22:55Z

src/llama_stack/core/stack.py

    default_embedding_model = vector_stores_config.default_embedding_model
-    if default_embedding_model is None:
-        return
+    if default_embedding_model is not None:


maybe consider breaking this up into multiple sub-validation methods with early return instead?

raghotham · 2025-11-21T20:22:58Z

src/llama_stack/core/datatypes.py

+        default=DEFAULT_QUERY_EXPANSION_PROMPT,
+        description="Prompt template for query expansion. Use {query} as placeholder for the original query.",
+    )
+    query_expansion_max_tokens: int = Field(


maybe consider also taking a single hyperparams object as a fallback if there are more params in the future - like repetition penalty?

yeah can do

raghotham · 2025-11-21T20:24:11Z

src/llama_stack/distributions/ci-tests/run-with-postgres-store.yaml

  default_embedding_model:
    provider_id: sentence-transformers
    model_id: nomic-ai/nomic-embed-text-v1.5
+  query_expansion_prompt: 'Expand this query with relevant synonyms and related terms.


no default query expansion model here?

We don't default an inference model so I thought it's best to leave it as opt in and choosing a default LLM is actually not something we really do explicitly today (we do probably in a suboptimal way)

Signed-off-by: Francisco Javier Arceo <[email protected]> adding query expansion model to vector store config Signed-off-by: Francisco Javier Arceo <[email protected]>

Signed-off-by: Francisco Javier Arceo <[email protected]>

mattf

please remove the formatting changes

Signed-off-by: Francisco Javier Arceo <[email protected]>

mattf

if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring
the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?

franciscojavierarceo · 2025-11-25T19:57:10Z

if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring

Will do.

the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?

@mattf if we want to use self.vector_store.vector_stores_config we'll have to modify VectorStoreWithIndex which will result in requiring us to modify all of the adapters. The globals are the alternative I thought made sense, let me know if you have alternative suggestions.

Signed-off-by: Francisco Javier Arceo <[email protected]>

franciscojavierarceo · 2025-11-26T04:56:18Z

@mattf I updated the code to handle the case where rewrite is requested but not configured.

Mind if I split the Adapter changes in a subsequent PR?

mattf · 2025-11-26T11:11:28Z

if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring

Will do.

the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?

@mattf if we want to use self.vector_store.vector_stores_config we'll have to modify VectorStoreWithIndex which will result in requiring us to modify all of the adapters. The globals are the alternative I thought made sense, let me know if you have alternative suggestions.

my mistake, you're right that query_chunks is adapter specific. i was trying to suggest lifting the handling out of the individual adapters entirely. i don't see why adapters need to even know that the user is requesting query rewriting. the adapter just accepts a string to query against. worst case, the rewrite could happen in the mixin?

btw, i noticed that the routing layer only ever passes down a query: str, but the adapters accept a query: InterleavedContent.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 17, 2025

franciscojavierarceo force-pushed the filesearch-rewrite-query branch 5 times, most recently from 83cece1 to 5349c33 Compare November 18, 2025 01:46

franciscojavierarceo marked this pull request as ready for review November 18, 2025 01:51

franciscojavierarceo requested review from ashwinb, bbrowning, ehhuang, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1 and yanxi0830 as code owners November 18, 2025 01:51

franciscojavierarceo commented Nov 18, 2025

View reviewed changes

src/llama_stack/providers/utils/memory/vector_store.py Outdated Show resolved Hide resolved

mattf reviewed Nov 18, 2025

View reviewed changes

raghotham reviewed Nov 18, 2025

View reviewed changes

franciscojavierarceo force-pushed the filesearch-rewrite-query branch from 859f4c2 to 1c93410 Compare November 18, 2025 18:31

mergify bot added the needs-rebase label Nov 19, 2025

franciscojavierarceo force-pushed the filesearch-rewrite-query branch from 685eb05 to d935650 Compare November 19, 2025 15:41

mergify bot removed the needs-rebase label Nov 19, 2025

franciscojavierarceo changed the title ~~feat: Actualize query rewrite in search API~~ feat: Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig, and update providers to use VectorStoresConfig Nov 19, 2025

franciscojavierarceo force-pushed the filesearch-rewrite-query branch from d5cbe64 to 55c1f97 Compare November 21, 2025 16:44

franciscojavierarceo requested a review from cdoern as a code owner November 21, 2025 19:20

franciscojavierarceo force-pushed the filesearch-rewrite-query branch 4 times, most recently from 3b09a95 to ae27fe3 Compare November 21, 2025 20:01

franciscojavierarceo commented Nov 21, 2025

View reviewed changes

raghotham reviewed Nov 21, 2025

View reviewed changes

franciscojavierarceo force-pushed the filesearch-rewrite-query branch from 9638a51 to cd637c3 Compare November 22, 2025 04:43

franciscojavierarceo added 5 commits November 21, 2025 23:44

feat: Actualize query rewrite in search API

61a4738

Signed-off-by: Francisco Javier Arceo <[email protected]> adding query expansion model to vector store config Signed-off-by: Francisco Javier Arceo <[email protected]>

adding config to providers so that it can properly be used

ac7cb1b

Signed-off-by: Francisco Javier Arceo <[email protected]>

added quey expnasion model to extra_body

2cc7943

Signed-off-by: Francisco Javier Arceo <[email protected]>

refactor to only configuration of model at build time

d887f1f

Signed-off-by: Francisco Javier Arceo <[email protected]>

renaming to query_rewrite, consolidating, and cleaning up validation

31e28b6

Signed-off-by: Francisco Javier Arceo <[email protected]>

franciscojavierarceo force-pushed the filesearch-rewrite-query branch from cd637c3 to 31e28b6 Compare November 22, 2025 04:44

Merge branch 'main' into filesearch-rewrite-query

88ce118

mattf requested changes Nov 24, 2025

View reviewed changes

franciscojavierarceo and others added 2 commits November 25, 2025 00:06

undoing formatting and updating missed expansion parameterS

2ebc56c

Signed-off-by: Francisco Javier Arceo <[email protected]>

Merge branch 'main' into filesearch-rewrite-query

a57a77a

mattf requested changes Nov 25, 2025

View reviewed changes

raise when querying without config

5ec6f5d

Signed-off-by: Francisco Javier Arceo <[email protected]>

feat: Add support for query rewrite in vector_store.search #4171

Are you sure you want to change the base?

feat: Add support for query rewrite in vector_store.search #4171

Uh oh!

Conversation

franciscojavierarceo commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattf commented Nov 18, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo commented Nov 18, 2025

Uh oh!

mattf commented Nov 19, 2025

Uh oh!

mergify bot commented Nov 19, 2025

Uh oh!

franciscojavierarceo commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattf commented Nov 20, 2025

Uh oh!

anik120 commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franciscojavierarceo Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franciscojavierarceo commented Nov 26, 2025

Uh oh!

mattf commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

franciscojavierarceo commented Nov 17, 2025 •

edited

Loading

franciscojavierarceo Nov 18, 2025 •

edited

Loading

franciscojavierarceo commented Nov 20, 2025 •

edited

Loading

anik120 commented Nov 20, 2025 •

edited

Loading

franciscojavierarceo Nov 21, 2025 •

edited

Loading

franciscojavierarceo commented Nov 25, 2025 •

edited

Loading