Skip to content

Conversation

@franciscojavierarceo
Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo commented Nov 17, 2025

What does this PR do?

Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig.

Makes rewrite_query parameter functional in vector store search.

  • rewrite_query=false (default): Use original query
  • rewrite_query=true: Expand query via LLM, or fail gracefully if no LLM available

Adds 4 parameters toVectorStoresConfig:

  • default_query_expansion_model: LLM model for query expansion (optional)
  • query_expansion_prompt: Custom prompt template (optional, uses built-in default)
  • query_expansion_max_tokens: Configurable token limit (default: 100)
  • query_expansion_temperature: Configurable temperature (default: 0.3)

Enabled run.yaml:

  vector_stores:
    rewrite_query_params:
      model:
        provider_id: "meta-reference"
        model_id: "llama3_1_8b_instruct"
      # prompt defaults to built-in
      # max_tokens defaults to 100
      # temperature defaults to 0.3

Fully customized run.yaml:

  vector_stores:
    rewrite_query_params:
      model:
        provider_id: "meta-reference"
        model_id: "llama3_1_8b_instruct"
      prompt: "Improve this search query: {query}"
      max_tokens: 150
      temperature: 0.2

Test Plan

Added test and recording

Example script as well:

import asyncio
from llama_stack_client import LlamaStackClient
from io import BytesIO

def gen_file(client, text: str=""):
    file_buffer = BytesIO(text.encode('utf-8'))
    file_buffer.name = "my_file.txt"

    uploaded_file = client.files.create(
        file=file_buffer,
        purpose="assistants"
    )
    return uploaded_file

async def test_query_rewriting():
    client = LlamaStackClient(base_url="http://0.0.0.0:8321/")
    uploaded_file = gen_file(client, "banana banana apple")
    uploaded_file2 = gen_file(client, "orange orange kiwi")

    vs = client.vector_stores.create()
    xf_vs = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)
    xf_vs1 = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file2.id)
    response1 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="apple",
                max_num_results=3,
                rewrite_query=False
            )
    response2 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="kiwi",
                max_num_results=3,
                rewrite_query=True,
            )

    print(f"\n🔵 Response 1 (rewrite_query=False):\n\033[94m{response1}\033[0m")
    print(f"\n🟢 Response 2 (rewrite_query=True):\n\033[92m{response2}\033[0m")

    for f in [uploaded_file.id, uploaded_file2.id]:
        client.files.delete(file_id=f)
    client.vector_stores.delete(vector_store_id=vs.id)

if __name__ == "__main__":
    asyncio.run(test_query_rewriting())

And see the screen shot of the server logs showing it worked.
Screenshot 2025-11-19 at 1 16 03 PM

Notice the log:

 Query rewritten:
         'kiwi''kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'

So kiwi was expanded.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 17, 2025
@franciscojavierarceo franciscojavierarceo force-pushed the filesearch-rewrite-query branch 5 times, most recently from 83cece1 to 5349c33 Compare November 18, 2025 01:46
@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review November 18, 2025 01:51
llm_models = [m for m in models_response.data if m.model_type == ModelType.llm]

# Filter out models that are known to be embedding models (misclassified as LLM)
embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this and provider_priority below

@mattf
Copy link
Collaborator

mattf commented Nov 18, 2025

@franciscojavierarceo fyi, the example has apple as the first query, the log shows kiwi twice

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about having the vector store config specify the rewrite model and the request is rejected if none is configured?

this would make the behavior somewhat stable.

the config would be per vector store. the rewrite prompt could be a config option as well. maybe you go as far as to include completion params like temperature.

   ...
   query_rewriter:
      model: ollama/llama6-magic
      prompt: "do your thing on {query} and be magical"
   ...

llm_models = [m for m in models_response.data if m.model_type == ModelType.llm]

# Filter out models that are known to be embedding models (misclassified as LLM)
embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of hardcoding models and providers, cant you optionally just take in a "query_rewrite_model" when creating the vector store? Also, can we use "metadata" attribute to pass in parameters that are not supported by openai?

Copy link
Collaborator Author

@franciscojavierarceo franciscojavierarceo Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I'm adding, similar to what @mattf suggested too. I got that working last night but ended up going to bed before pushing it.

@franciscojavierarceo
Copy link
Collaborator Author

what about having the vector store config specify the rewrite model and the request is rejected if none is configured?

Yeah, that's actually what I ended up adding, sorry requested reviews a bit premature I'll push that update soon.

@mattf
Copy link
Collaborator

mattf commented Nov 19, 2025

@franciscojavierarceo please update the description with the new proposed config and user interaction

@mergify
Copy link

mergify bot commented Nov 19, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 19, 2025
@mergify mergify bot removed the needs-rebase label Nov 19, 2025
@franciscojavierarceo franciscojavierarceo changed the title feat: Actualize query rewrite in search API feat: Actualize query rewrite in search API, add default_query_expansion_model and query_expansion_prompt in VectorStoresConfig, and update providers to use VectorStoresConfig Nov 19, 2025
@franciscojavierarceo
Copy link
Collaborator Author

franciscojavierarceo commented Nov 20, 2025

the spec doesn't mention letting api callers override the vector store config, but the implementation appears to let them. is this intended?

@mattf this was added at the request of @raghotham in discord. I can update the PR description to reflect this for sure. I actually would have preferred to do so as a follow up to reduce scope.

the model validation logic should be left to the inference api. if the model is of the wrong type the inference api should handle the error generation.

Sounds good, will revise.

vector store model type validation should happen during startup, if at all. right now the admin sees a successful startup and the user sees 400 errors about a misconfiguration.

Will revise. 👍

@mattf
Copy link
Collaborator

mattf commented Nov 20, 2025

the spec doesn't mention letting api callers override the vector store config, but the implementation appears to let them. is this intended?

@mattf this was added at the request of @raghotham in discord. I can update the PR description to reflect this for sure. I actually would have preferred to do so as a follow up to reduce scope.

@raghotham imho we have two use cases here:

  • (a) production - the vector store config gets set at creation time, which should include locking in the embedding model + rewrite model + rewrite prompt. having the rewrite model & prompt flexibility allows for tailoring vector stores to an app (real value here).
  • (b) research & development - the best config for the vector store is not known / tuned yet. in this case, it's wasteful to recreate (re-embed) the vector store for each iteration to tune the rewrite model & prompt. therefore, passing a new model & prompt for each query is important.

if we need both (a) and (b) in this pr, i suggest not taking extra args on the query call, but using the existing vector store update path.

@anik120
Copy link
Contributor

anik120 commented Nov 20, 2025

Ack @mattf - +1 with the startup validation approach.

@franciscojavierarceo FYI I created #4184 to tackle startup validation for optional dependencies so admins catch
misconfigurations at deploy time rather than users hitting 400 errors and needing to debug after. Looks like the validation changes here would align with that pattern.

On the production vs. R&D discussion: Matt's suggestion to use the "update vector store" path seems like a good middle
ground - gives R&D the flexibility to iterate without per-query instability. Happy to discuss additional validation patterns in
#4184 if it helps.

@franciscojavierarceo franciscojavierarceo force-pushed the filesearch-rewrite-query branch 4 times, most recently from 3b09a95 to ae27fe3 Compare November 21, 2025 20:01
await validate_vector_stores_config(self.run_config.vector_stores, impls)
await validate_safety_config(self.run_config.safety, impls)

# Set global query expansion configuration from stack config
Copy link
Collaborator Author

@franciscojavierarceo franciscojavierarceo Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importing here to avoid importing numpy

default=None,
description="Default embedding model configuration for vector stores.",
)
default_query_expansion_model: QualifiedModel | None = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just stick to one of expansion or rewrite across the board? its a little confusing if we have expansion here but rewrite elsewhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

default_embedding_model = vector_stores_config.default_embedding_model
if default_embedding_model is None:
return
if default_embedding_model is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe consider breaking this up into multiple sub-validation methods with early return instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

default=DEFAULT_QUERY_EXPANSION_PROMPT,
description="Prompt template for query expansion. Use {query} as placeholder for the original query.",
)
query_expansion_max_tokens: int = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe consider also taking a single hyperparams object as a fallback if there are more params in the future - like repetition penalty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah can do

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
query_expansion_prompt: 'Expand this query with relevant synonyms and related terms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no default query expansion model here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't default an inference model so I thought it's best to leave it as opt in and choosing a default LLM is actually not something we really do explicitly today (we do probably in a suboptimal way)

Signed-off-by: Francisco Javier Arceo <[email protected]>

adding query expansion model to vector store config

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the formatting changes

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring
  • the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?

@franciscojavierarceo
Copy link
Collaborator Author

franciscojavierarceo commented Nov 25, 2025

if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring

Will do.

the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?

@mattf if we want to use self.vector_store.vector_stores_config we'll have to modify VectorStoreWithIndex which will result in requiring us to modify all of the adapters. The globals are the alternative I thought made sense, let me know if you have alternative suggestions.

Signed-off-by: Francisco Javier Arceo <[email protected]>
@franciscojavierarceo
Copy link
Collaborator Author

@mattf I updated the code to handle the case where rewrite is requested but not configured.

Mind if I split the Adapter changes in a subsequent PR?

@mattf
Copy link
Collaborator

mattf commented Nov 26, 2025

if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring

Will do.

the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?

@mattf if we want to use self.vector_store.vector_stores_config we'll have to modify VectorStoreWithIndex which will result in requiring us to modify all of the adapters. The globals are the alternative I thought made sense, let me know if you have alternative suggestions.

my mistake, you're right that query_chunks is adapter specific. i was trying to suggest lifting the handling out of the individual adapters entirely. i don't see why adapters need to even know that the user is requesting query rewriting. the adapter just accepts a string to query against. worst case, the rewrite could happen in the mixin?

btw, i noticed that the routing layer only ever passes down a query: str, but the adapters accept a query: InterleavedContent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants