-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add support for query rewrite in vector_store.search #4171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add support for query rewrite in vector_store.search #4171
Conversation
83cece1 to
5349c33
Compare
| llm_models = [m for m in models_response.data if m.model_type == ModelType.llm] | ||
|
|
||
| # Filter out models that are known to be embedding models (misclassified as LLM) | ||
| embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing this and provider_priority below
|
@franciscojavierarceo fyi, the example has apple as the first query, the log shows kiwi twice |
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about having the vector store config specify the rewrite model and the request is rejected if none is configured?
this would make the behavior somewhat stable.
the config would be per vector store. the rewrite prompt could be a config option as well. maybe you go as far as to include completion params like temperature.
...
query_rewriter:
model: ollama/llama6-magic
prompt: "do your thing on {query} and be magical"
...
| llm_models = [m for m in models_response.data if m.model_type == ModelType.llm] | ||
|
|
||
| # Filter out models that are known to be embedding models (misclassified as LLM) | ||
| embedding_model_patterns = ["minilm", "embed", "embedding", "nomic-embed"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of hardcoding models and providers, cant you optionally just take in a "query_rewrite_model" when creating the vector store? Also, can we use "metadata" attribute to pass in parameters that are not supported by openai?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what I'm adding, similar to what @mattf suggested too. I got that working last night but ended up going to bed before pushing it.
Yeah, that's actually what I ended up adding, sorry requested reviews a bit premature I'll push that update soon. |
859f4c2 to
1c93410
Compare
|
@franciscojavierarceo please update the description with the new proposed config and user interaction |
|
This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
685eb05 to
d935650
Compare
default_query_expansion_model and query_expansion_prompt in VectorStoresConfig, and update providers to use VectorStoresConfig
@mattf this was added at the request of @raghotham in discord. I can update the PR description to reflect this for sure. I actually would have preferred to do so as a follow up to reduce scope.
Sounds good, will revise.
Will revise. 👍 |
@raghotham imho we have two use cases here:
if we need both (a) and (b) in this pr, i suggest not taking extra args on the query call, but using the existing vector store update path. |
|
Ack @mattf - +1 with the startup validation approach. @franciscojavierarceo FYI I created #4184 to tackle startup validation for optional dependencies so admins catch On the production vs. R&D discussion: Matt's suggestion to use the "update vector store" path seems like a good middle |
d5cbe64 to
55c1f97
Compare
3b09a95 to
ae27fe3
Compare
| await validate_vector_stores_config(self.run_config.vector_stores, impls) | ||
| await validate_safety_config(self.run_config.safety, impls) | ||
|
|
||
| # Set global query expansion configuration from stack config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
importing here to avoid importing numpy
src/llama_stack/core/datatypes.py
Outdated
| default=None, | ||
| description="Default embedding model configuration for vector stores.", | ||
| ) | ||
| default_query_expansion_model: QualifiedModel | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just stick to one of expansion or rewrite across the board? its a little confusing if we have expansion here but rewrite elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/llama_stack/core/stack.py
Outdated
| default_embedding_model = vector_stores_config.default_embedding_model | ||
| if default_embedding_model is None: | ||
| return | ||
| if default_embedding_model is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe consider breaking this up into multiple sub-validation methods with early return instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
src/llama_stack/core/datatypes.py
Outdated
| default=DEFAULT_QUERY_EXPANSION_PROMPT, | ||
| description="Prompt template for query expansion. Use {query} as placeholder for the original query.", | ||
| ) | ||
| query_expansion_max_tokens: int = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe consider also taking a single hyperparams object as a fallback if there are more params in the future - like repetition penalty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah can do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
| default_embedding_model: | ||
| provider_id: sentence-transformers | ||
| model_id: nomic-ai/nomic-embed-text-v1.5 | ||
| query_expansion_prompt: 'Expand this query with relevant synonyms and related terms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no default query expansion model here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't default an inference model so I thought it's best to leave it as opt in and choosing a default LLM is actually not something we really do explicitly today (we do probably in a suboptimal way)
9638a51 to
cd637c3
Compare
Signed-off-by: Francisco Javier Arceo <[email protected]> adding query expansion model to vector store config Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
cd637c3 to
31e28b6
Compare
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the formatting changes
Signed-off-by: Francisco Javier Arceo <[email protected]>
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if rewrite is requested but the feature isn't configured, the user should get a 400 instead of silently ignoring
- the globals are risky and will lead to bugs. there may already be one for two vector stores configured w/ different prompts. can we skip the globals and use something like self.vector_store.vector_stores_config in query_chunks instead?
Will do.
@mattf if we want to use |
Signed-off-by: Francisco Javier Arceo <[email protected]>
|
@mattf I updated the code to handle the case where rewrite is requested but not configured. Mind if I split the Adapter changes in a subsequent PR? |
my mistake, you're right that query_chunks is adapter specific. i was trying to suggest lifting the handling out of the individual adapters entirely. i don't see why adapters need to even know that the user is requesting query rewriting. the adapter just accepts a string to query against. worst case, the rewrite could happen in the mixin? btw, i noticed that the routing layer only ever passes down a query: str, but the adapters accept a query: InterleavedContent. |
What does this PR do?
Actualize query rewrite in search API, add
default_query_expansion_modelandquery_expansion_promptinVectorStoresConfig.Makes
rewrite_queryparameter functional in vector store search.rewrite_query=false(default): Use original queryrewrite_query=true: Expand query via LLM, or fail gracefully if no LLM availableAdds 4 parameters to
VectorStoresConfig:default_query_expansion_model: LLM model for query expansion (optional)query_expansion_prompt: Custom prompt template (optional, uses built-in default)query_expansion_max_tokens: Configurable token limit (default: 100)query_expansion_temperature: Configurable temperature (default: 0.3)Enabled
run.yaml:Fully customized
run.yaml:Test Plan
Added test and recording
Example script as well:
And see the screen shot of the server logs showing it worked.

Notice the log:
Query rewritten: 'kiwi' → 'kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'So
kiwiwas expanded.