Skip to content

Conversation

ercsonusharma
Copy link

@ercsonusharma ercsonusharma commented Jul 4, 2025

https://issues.apache.org/jira/browse/SOLR-17319

Description

This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.

Note: This feature is currently unsupported for non-distributed and grouping query.

Solution

  • Extended the QueryComponent to create new CombinedQueryComponent and ResponseBuilder to create new CombinedQueryResponseBuilder supports multiple response builders to hold the state and execute multiple queries.
  • In JSON Query DSL, a parameter is added to identity Combined Query request and basis that it invokes the new CombinedQueryComponent
  • CombinedQueryComponent have multiple response builders assigned for each query. These queries are first executed at the SolrSearchIndexer level and combined them using RRF for now.
  • At Shard level also, the responses for the multiple queries are merged.

Tests

  • Added tests for testing the RRF logic independently.
  • Added tests for across search index and distributed requests.
  • Added tests to assert existing behaviour of search handler's QueryComponent as well as for the newly added CombinedQueryComponent basis the flag in json query DSL.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@ercsonusharma
Copy link
Author

@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks!

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!

@alessandrobenedetti
Copy link
Contributor

Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement!

In the next couple of weeks, I should be able to give it a go and review it!

@ercsonusharma ercsonusharma requested a review from atris July 16, 2025 18:45
// save these results in a private area so we can access them
// again when retrieving stored fields.
// TODO: use ResponseBuilder (w/ comments) or the request context?
rb.resultIds = createShardResult(rb, shardDocMap, responseDocs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If as per https://github.com/apache/solr/pull/3418/files#r2314366325 the maxScore setting were to be removed then I think here we could simplify like this ...

Suggested change
rb.resultIds = createShardResult(rb, shardDocMap, responseDocs);
rb.resultIds = createShardResult(rb, shardDocMap);
for (int i = 0; i < rb.resultIds.size(); i++) responseDocs.add(null);

... and then somehow (still thinking about that) somethingcreateShardResult-like could be factored out in the QueryComponent base class and overridden here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it cannot be removed. maxScore is shown as the part of SolrDocument result and it has to be updated with latest maxScore after rrf.

Comment on lines 447 to 469
ShardDoc shardDoc = new ShardDoc();
shardDoc.id = id;
shardDoc.shard = srsp.getShard();
shardDoc.orderInShard = i;
Object scoreObj = doc.getFieldValue(SolrReturnFields.SCORE);
if (scoreObj != null) {
if (scoreObj instanceof String) {
shardDoc.score = Float.parseFloat((String) scoreObj);
} else {
shardDoc.score = ((Number) scoreObj).floatValue();
}
}
if (!scoreDependentFields.isEmpty()) {
shardDoc.scoreDependentFields = doc.getSubsetOfFields(scoreDependentFields);
}

shardDoc.sortFieldValues = unmarshalledSortFieldValues;
shardDocMap.computeIfAbsent(srsp.getShard(), list -> new ArrayList<>()).add(shardDoc);
String prevShard = uniqueDoc.put(id, srsp.getShard());
if (prevShard != null) {
// duplicate detected
numFound--;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observations: the QueryComponent equivalent to this block of code is https://github.com/apache/solr/blob/releases/solr/9.9.0/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L1122-L1156 but there are differences:

  • in QueryComponent duplicates are unusual and will be omitted (and the numFound counter decremented)
  • in CombinedQueryComponent duplicates are possible and must not be omitted (but the numFound counter will be decremented)

Added 4dcbb57 dev increment with this in mind i.e. supportive of both scenarios.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not exactly duplicated all the code of the method, rather half of it.

@dsmiley
Copy link
Contributor

dsmiley commented Sep 2, 2025

Can you please "resolve" any conversation you think were addressed? This is a long PR with many conversations, making it hard to catch up with the current state.

final var unparsedQuery = params.get(queryKey);
ResponseBuilder rbNew = new ResponseBuilder(rb.req, new SolrQueryResponse(), rb.components);
rbNew.setQueryString(unparsedQuery);
super.prepare(rbNew);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we want to manipulate the sort spec so that we get all docs up to offset (AKA "start" param) + rows since RRF/combiner is going to want to see all docs/rankings up to offset+rows? Otherwise our combiner is blind to the "offset" docs. Assuming you agree, then we need to basically apply paging at this layer (our component) instead of letting the subquery do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It anyways happening here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for distributed-search but not single-core search.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think user-managed/standalone vs SolrCloud is orthogonal. This is about a single shard working correctly (in whatever Solr mode). IMO it's not optional for basic paging parameters to work correctly with one shard.

I could imagine we'd prefer a mechanism for a SearchComponent to force the "shortCircuit"=false thereby ensuring there's always a distributed phase. Maybe that could be done by re-ordering SearchHandler's call to getAndPrepShardHandler to be after prepareComponents (swap adjacent lines)? Then the prepare method of this component could force distrib and add the shortCircuit=false or something like that. And/or maybe a component should have a more elegant callback to communicate that it forces distributed search (even when there's one shard/core). This would overall simplify this component, no longer needing to handle paging in process(); instead do for distributed-search only.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beauty/wisdom of BaseDistributedSearchTestCase is that it tests consistency between single shard and multi-shard. I think it's brilliant; that is the point of this base class. Doing so requires that you use the correct utility methods it provides. I noticed your test calls queryServer instead of query. If you look at their impls, you'll see what I'm getting at. You'll see other subclass tests using the various methods to do these tests.

I suspect there's a single-shard pagination bug. If so, then correct usage of this base class would surface it without you having to write more tests.

@dsmiley
Copy link
Contributor

dsmiley commented Sep 4, 2025

The beauty/wisdom of BaseDistributedSearchTestCase is that it tests consistency between single shard and multi-shard. I think it's brilliant; that is the point of this base class.

Yet this PR/approach will not be able to comply since unlike most (all?) components, its results are affected substantially by distributed-search. The (unsaid?) vision of sharding / distributed-search was getting the same results as a single shard, and Solr does the work to pull off that trick, with plenty of tests demonstrating it does. In fact I'd say, with great disappointment, that the observed (by a user) results of this component will not be RRF when there's distributed search over shards.

@ercsonusharma
Copy link
Author

Yet this PR/approach will not be able to comply since unlike most (all?) components, its results are affected substantially by distributed-search. The (unsaid?) vision of sharding / distributed-search was getting the same results as a single shard, and Solr does the work to pull off that trick, with plenty of tests demonstrating it does. In fact I'd say, with great disappointment, that the observed (by a user) results of this component will not be RRF when there's distributed search over shards.

pushed a change to the PR that adds an option for the user to choose which Combiner method to use — Way 1 (pre) or Way 2 (post).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:search client:solrj documentation Improvements or additions to documentation tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants