-
Notifications
You must be signed in to change notification settings - Fork 756
SOLR-17319 : Combined Query Feature for Multi Query Execution #3418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks! |
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement! In the next couple of weeks, I should be able to give it a go and review it! |
solr/core/src/java/org/apache/solr/handler/component/CombinedQuerySearchHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
// save these results in a private area so we can access them | ||
// again when retrieving stored fields. | ||
// TODO: use ResponseBuilder (w/ comments) or the request context? | ||
rb.resultIds = createShardResult(rb, shardDocMap, responseDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If as per https://github.com/apache/solr/pull/3418/files#r2314366325 the maxScore
setting were to be removed then I think here we could simplify like this ...
rb.resultIds = createShardResult(rb, shardDocMap, responseDocs); | |
rb.resultIds = createShardResult(rb, shardDocMap); | |
for (int i = 0; i < rb.resultIds.size(); i++) responseDocs.add(null); |
... and then somehow (still thinking about that) somethingcreateShardResult-like
could be factored out in the QueryComponent
base class and overridden here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it cannot be removed. maxScore is shown as the part of SolrDocument result and it has to be updated with latest maxScore
after rrf.
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
ShardDoc shardDoc = new ShardDoc(); | ||
shardDoc.id = id; | ||
shardDoc.shard = srsp.getShard(); | ||
shardDoc.orderInShard = i; | ||
Object scoreObj = doc.getFieldValue(SolrReturnFields.SCORE); | ||
if (scoreObj != null) { | ||
if (scoreObj instanceof String) { | ||
shardDoc.score = Float.parseFloat((String) scoreObj); | ||
} else { | ||
shardDoc.score = ((Number) scoreObj).floatValue(); | ||
} | ||
} | ||
if (!scoreDependentFields.isEmpty()) { | ||
shardDoc.scoreDependentFields = doc.getSubsetOfFields(scoreDependentFields); | ||
} | ||
|
||
shardDoc.sortFieldValues = unmarshalledSortFieldValues; | ||
shardDocMap.computeIfAbsent(srsp.getShard(), list -> new ArrayList<>()).add(shardDoc); | ||
String prevShard = uniqueDoc.put(id, srsp.getShard()); | ||
if (prevShard != null) { | ||
// duplicate detected | ||
numFound--; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observations: the QueryComponent
equivalent to this block of code is https://github.com/apache/solr/blob/releases/solr/9.9.0/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L1122-L1156 but there are differences:
- in
QueryComponent
duplicates are unusual and will be omitted (and thenumFound
counter decremented) - in
CombinedQueryComponent
duplicates are possible and must not be omitted (but thenumFound
counter will be decremented)
Added 4dcbb57 dev increment with this in mind i.e. supportive of both scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not exactly duplicated all the code of the method, rather half of it.
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
Outdated
Show resolved
Hide resolved
Can you please "resolve" any conversation you think were addressed? This is a long PR with many conversations, making it hard to catch up with the current state. |
solr/solr-ref-guide/modules/query-guide/pages/json-combined-query-dsl.adoc
Show resolved
Hide resolved
solr/solr-ref-guide/modules/query-guide/pages/json-combined-query-dsl.adoc
Show resolved
Hide resolved
…ue instead of implementMergeIds-taking-ShardDocQueueFactory
solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
final var unparsedQuery = params.get(queryKey); | ||
ResponseBuilder rbNew = new ResponseBuilder(rb.req, new SolrQueryResponse(), rb.components); | ||
rbNew.setQueryString(unparsedQuery); | ||
super.prepare(rbNew); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we want to manipulate the sort spec so that we get all docs up to offset (AKA "start" param) + rows since RRF/combiner is going to want to see all docs/rankings up to offset+rows? Otherwise our combiner is blind to the "offset" docs. Assuming you agree, then we need to basically apply paging at this layer (our component) instead of letting the subquery do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It anyways happening here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's for distributed-search but not single-core search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think user-managed/standalone vs SolrCloud is orthogonal. This is about a single shard working correctly (in whatever Solr mode). IMO it's not optional for basic paging parameters to work correctly with one shard.
I could imagine we'd prefer a mechanism for a SearchComponent to force the "shortCircuit"=false thereby ensuring there's always a distributed phase. Maybe that could be done by re-ordering SearchHandler's call to getAndPrepShardHandler
to be after prepareComponents
(swap adjacent lines)? Then the prepare method of this component could force distrib and add the shortCircuit=false or something like that. And/or maybe a component should have a more elegant callback to communicate that it forces distributed search (even when there's one shard/core). This would overall simplify this component, no longer needing to handle paging in process(); instead do for distributed-search only.
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/QueryAndResponseCombiner.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/CombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The beauty/wisdom of BaseDistributedSearchTestCase is that it tests consistency between single shard and multi-shard. I think it's brilliant; that is the point of this base class. Doing so requires that you use the correct utility methods it provides. I noticed your test calls queryServer
instead of query
. If you look at their impls, you'll see what I'm getting at. You'll see other subclass tests using the various methods to do these tests.
I suspect there's a single-shard pagination bug. If so, then correct usage of this base class would surface it without you having to write more tests.
solr/core/src/java/org/apache/solr/search/combine/ReciprocalRankFusion.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java
Outdated
Show resolved
Hide resolved
Yet this PR/approach will not be able to comply since unlike most (all?) components, its results are affected substantially by distributed-search. The (unsaid?) vision of sharding / distributed-search was getting the same results as a single shard, and Solr does the work to pull off that trick, with plenty of tests demonstrating it does. In fact I'd say, with great disappointment, that the observed (by a user) results of this component will not be RRF when there's distributed search over shards. |
pushed a change to the PR that adds an option for the user to choose which Combiner method to use — Way 1 (pre) or Way 2 (post). |
https://issues.apache.org/jira/browse/SOLR-17319
Description
This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.
Note: This feature is currently unsupported for non-distributed and grouping query.
Solution
Tests
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.