-
Notifications
You must be signed in to change notification settings - Fork 780
SOLR-17319 : Combined Query Feature for Multi Query Execution #3418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
|
@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks! |
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
|
Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement! In the next couple of weeks, I should be able to give it a go and review it! |
solr/core/src/java/org/apache/solr/handler/component/CombinedQuerySearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
|
@dsmiley, Really appreciate your time and effort so far on the PR. When you have a moment, would you mind taking a look at the latest commit changes addressing the concerns raised previously? Thanks! |
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was away at a conference; I'm back now. I just did a partial review. I have doubts that it makes sense to include "Way 1" if, I imagine, it increases the complexity / documentation matters and more fundamentally... why would someone choose it.
| if (req.getHttpSolrCall() != null | ||
| && StringUtils.isEmpty(req.getParams().get(ShardParams.SHARDS))) { | ||
| String scheme = req.getHttpSolrCall().getReq().getScheme(); | ||
| String host = req.getHttpSolrCall().getReq().getServerName(); | ||
| int port = req.getHttpSolrCall().getReq().getServerPort(); | ||
| String context = req.getHttpSolrCall().getReq().getContextPath(); | ||
| String core = req.getCore().getName(); | ||
| String localShardUrl = | ||
| String.format(Locale.ROOT, "%s://%s:%d%s/%s", scheme, host, port, context, core); | ||
| solrParams.set(ShardParams.SHARDS, localShardUrl); | ||
| req.setParams(solrParams); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part looks suspicious to me; so shortCircuit=false isn't enough? Did you construct this based on seeing similar code elsewhere (where?) to create a shards URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, shortCircuit=false is not enough as ShardHandler also needs the shards parameter which comes directly from SolrParams. While, making shortCircuit=true makes the rb.isDistrib=false and force the method to not go through the shardHandler requests, rather stick to node local indexes through SearchComponent.process method.
I couldn't find this code anywhere but created myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I see so it needs to happen somewhere, and you've put it here. It feels out of place in this method, however. Based on what you showed me, it could be in HttpShardHandler where StandaloneReplicaSource.Builder is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the URL core construction here seems like it should be a utility method on HttpSolrCall
solr/core/src/java/org/apache/solr/handler/component/ResponseBuilder.java
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Show resolved
Hide resolved
I don't think if it introduces any additional complexity in terms of code or documentation (adding a parameter combiner.method), rather it reuses several piece of code. |
I disagree. Only having per-shard RRF quickly devolves to shard interleaving as the shard count increases, since at the coordination/aggregation, there's no global overall ranking measure left anymore. The results from each shard are ultimately treated as equivalent across the shards. Consequently, if say the real RRF best result pointed clearly to one document (best score of both sub-queries, lets say), then in the per-shard RRF it'd merely be arbitrarily somewhere in the top-20 if say there are 20 shards. Nobody would want that. |
|
okay @dsmiley . So I have kept only the Way 2 for now, removing Way 1 completely. |
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has changed so much; I'll need to come at this afresh (not just looking at deltas from previous review). But first... can you please look at Christine's POC that leverages SearchComponent phases (as I had hinted in JIRA may be possible) #3648
|
@dsmiley I had a chance to take a look at Christine's POC changes and have updated my findings there.
Just checking in—have you had a chance to take a look at this yet? |
solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
| * @throws Exception the exception | ||
| */ | ||
| @Test | ||
| public void testMultipleQueryWithSort() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test's existence surprises me but I'm reminded you are thinking of this feature generically, not assuming RRF. I suppose if we're sorting by a field then the underlying implementation would ideally be merely a BooleanQuery with two SHOULD clauses. Would a user bother with this plugin for such a case when they can do that trivially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting can also be applied to the results after processing RRF; otherwise, there is no built-in support for joining two queries. Sorting should be supported in case of combined query to maintain consistency with the query structure. The final RRF score would act as a tie breaker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a boolean OR of both subqueries without CombinedQueryComponent count as "built in support for joining two queries"? "Consistency of the query structure".... huh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Consistency of the query structure"
I meant user creating the query for a use case. They don't have to use two query types - Boolean specifically for sorting and Combined for others.
solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
|
@dsmiley - Just checking if you have any concerns apart from a few mentioned above, which I have resolved recently. Not sure if you had a chance to review it completely yet. Would you be able to advise on the best path forward? |
|
Sorry; this PR deserves more attention but I've been prioritizing my limited time to Solr 10. I'll return to this as soon as I can... perhaps tonight. I said that to myself yesterday so it's at least at the top of my discretionary list. |
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/SearchComponent.java
Outdated
Show resolved
Hide resolved
| */ | ||
| @Override | ||
| @SuppressWarnings("unchecked") | ||
| public void process(ResponseBuilder rb) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this component forces distributed mode, why would process() be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With reference, In the case of distributed mode, an HTTP request is made with rb.distrib=false through ShardHandlerRequest in one of the phases. The process method would be called for that particular shard query request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an isShard=true request then. Okay; but then why wouldn't the logic just pass-through as normal? It's not clear why CombinedQueryComponent has any logic to accomplish at this phase. "Way 2" is only at aggregation (disbtrib=true), in my head anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, took longer to reply.
The forced distributed would anyway enable the shard call (isShard=true).
then why wouldn't the logic just pass-through as normal
The process() method is directly communicating with the Searcher as we need to process each query at the shard searcher level. If you would look at the method override, it is also creating the combined docSet across queries other than few other metadata, required for merging the docs finally across the shard at the coordinator level.
However, this Searcher call per query can be further parallelised as added in comment.
@dsmiley
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
| protected Map<Object, ShardDoc> createShardResult( | ||
| ResponseBuilder rb, | ||
| Map<String, List<ShardDoc>> shardDocMap, | ||
| SolrDocumentList responseDocs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unfortunate to have a method that both returns something and also manipulates an argument. At least you documented it.
|
@dsmiley Was I able to address your concern through above changes and responses? Please let me know if there is more. |
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my fundamental concerns are addressed. I left some comments. I'd prefer that the modifications, especially relating to forceDistrib, be in a separate PR and having a basic test for forceDistrib in particular. Like just test with a dummy TestForceDistribComponent if process() is called, that distrib=false is set. Maybe it could be added to an existing distribution test in standalone and another in a distrib SolrCloud.
solr/core/src/java/org/apache/solr/handler/component/combine/package-info.java
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQuerySolrCloudTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQuerySolrCloudTest.java
Outdated
Show resolved
Hide resolved
|
I'll handle the forceDistrib matter; it was my idea after all so I'll see to creating a test. |
…a/solr into feat_combined_query
Thank you, @dsmiley for the extensive review. I have addressed those comments as well. UPDATE: @dsmiley - Have added the test cases for your convenience. Kindly have a look. |
https://issues.apache.org/jira/browse/SOLR-17319
Description
This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.
Note: This feature is currently unsupported for non-distributed and grouping query.
Solution
Tests
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.