Skip to content

Conversation

@ercsonusharma
Copy link

@ercsonusharma ercsonusharma commented Jul 4, 2025

https://issues.apache.org/jira/browse/SOLR-17319

Description

This feature aims to execute multiple queries of multiple kinds across multiple shards of a collection and combine their result basis an algorithm (like Reciprocal Rank Fusion). It also help resolve the issues being discussed w.r.t the previous PR, mainly around across shard documents merging. It provides more flexibility in terms of querying extending JSON Query DSL ultimately enabling Hybrid Search in a pure way solving the shortcomings.

Note: This feature is currently unsupported for non-distributed and grouping query.

Solution

  • Extended the QueryComponent to create new CombinedQueryComponent and ResponseBuilder to create new CombinedQueryResponseBuilder supports multiple response builders to hold the state and execute multiple queries.
  • In JSON Query DSL, a parameter is added to identity Combined Query request and basis that it invokes the new CombinedQueryComponent
  • CombinedQueryComponent have multiple response builders assigned for each query. These queries are first executed at the SolrSearchIndexer level and combined them using RRF for now.
  • At Shard level also, the responses for the multiple queries are merged.

Tests

  • Added tests for testing the RRF logic independently.
  • Added tests for across search index and distributed requests.
  • Added tests to assert existing behaviour of search handler's QueryComponent as well as for the newly added CombinedQueryComponent basis the flag in json query DSL.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@ercsonusharma
Copy link
Author

@alessandrobenedetti @dsmiley, please help review it whenever you can. Thanks!

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad to see this work began by acknowledging the existing work and trying to address the pitfalls!

@alessandrobenedetti
Copy link
Contributor

Hi @ercsonusharma , thanks for resurrecting this, didn't have time to dedicate to the feature in the last few months, good to see some movement!

In the next couple of weeks, I should be able to give it a go and review it!

@ercsonusharma ercsonusharma requested a review from atris July 16, 2025 18:45
@ercsonusharma
Copy link
Author

@dsmiley, Really appreciate your time and effort so far on the PR. When you have a moment, would you mind taking a look at the latest commit changes addressing the concerns raised previously? Thanks!

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was away at a conference; I'm back now. I just did a partial review. I have doubts that it makes sense to include "Way 1" if, I imagine, it increases the complexity / documentation matters and more fundamentally... why would someone choose it.

Comment on lines 513 to 525
if (req.getHttpSolrCall() != null
&& StringUtils.isEmpty(req.getParams().get(ShardParams.SHARDS))) {
String scheme = req.getHttpSolrCall().getReq().getScheme();
String host = req.getHttpSolrCall().getReq().getServerName();
int port = req.getHttpSolrCall().getReq().getServerPort();
String context = req.getHttpSolrCall().getReq().getContextPath();
String core = req.getCore().getName();
String localShardUrl =
String.format(Locale.ROOT, "%s://%s:%d%s/%s", scheme, host, port, context, core);
solrParams.set(ShardParams.SHARDS, localShardUrl);
req.setParams(solrParams);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks suspicious to me; so shortCircuit=false isn't enough? Did you construct this based on seeing similar code elsewhere (where?) to create a shards URL?

Copy link
Author

@ercsonusharma ercsonusharma Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, shortCircuit=false is not enough as ShardHandler also needs the shards parameter which comes directly from SolrParams. While, making shortCircuit=true makes the rb.isDistrib=false and force the method to not go through the shardHandler requests, rather stick to node local indexes through SearchComponent.process method.
I couldn't find this code anywhere but created myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see so it needs to happen somewhere, and you've put it here. It feels out of place in this method, however. Based on what you showed me, it could be in HttpShardHandler where StandaloneReplicaSource.Builder is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the URL core construction here seems like it should be a utility method on HttpSolrCall

@ercsonusharma
Copy link
Author

ercsonusharma commented Sep 16, 2025

I have doubts that it makes sense to include "Way 1" if, I imagine, it increases the complexity/documentation matters and more fundamentally... why would someone choose it.

I don't think if it introduces any additional complexity in terms of code or documentation (adding a parameter combiner.method), rather it reuses several piece of code.
IMO, If the data is independently sharded and shard-level scoring doesn't matter in overall query relevance, it may not make sense to first combine shard results based on their original scores and then apply RRF per query (as done in Way 2). In such cases, users may prefer Way 1.

@dsmiley
Copy link
Contributor

dsmiley commented Sep 16, 2025

IMO, If the data is independently sharded and shard-level scoring doesn't matter in overall query relevance, it may not make sense to first combine shard results based on their original scores and then apply RRF per query (as done in Way 2). In such cases, users may prefer Way 1.

I disagree. Only having per-shard RRF quickly devolves to shard interleaving as the shard count increases, since at the coordination/aggregation, there's no global overall ranking measure left anymore. The results from each shard are ultimately treated as equivalent across the shards. Consequently, if say the real RRF best result pointed clearly to one document (best score of both sub-queries, lets say), then in the per-shard RRF it'd merely be arbitrarily somewhere in the top-20 if say there are 20 shards. Nobody would want that.

@ercsonusharma
Copy link
Author

okay @dsmiley . So I have kept only the Way 2 for now, removing Way 1 completely.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has changed so much; I'll need to come at this afresh (not just looking at deltas from previous review). But first... can you please look at Christine's POC that leverages SearchComponent phases (as I had hinted in JIRA may be possible) #3648

@ercsonusharma
Copy link
Author

@dsmiley I had a chance to take a look at Christine's POC changes and have updated my findings there.

I'll need to come at this afresh (not just looking at deltas from previous review)

Just checking in—have you had a chance to take a look at this yet?

* @throws Exception the exception
*/
@Test
public void testMultipleQueryWithSort() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test's existence surprises me but I'm reminded you are thinking of this feature generically, not assuming RRF. I suppose if we're sorting by a field then the underlying implementation would ideally be merely a BooleanQuery with two SHOULD clauses. Would a user bother with this plugin for such a case when they can do that trivially?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting can also be applied to the results after processing RRF; otherwise, there is no built-in support for joining two queries. Sorting should be supported in case of combined query to maintain consistency with the query structure. The final RRF score would act as a tie breaker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a boolean OR of both subqueries without CombinedQueryComponent count as "built in support for joining two queries"? "Consistency of the query structure".... huh?

Copy link
Author

@ercsonusharma ercsonusharma Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Consistency of the query structure"

I meant user creating the query for a use case. They don't have to use two query types - Boolean specifically for sorting and Combined for others.

@ercsonusharma
Copy link
Author

@dsmiley - Just checking if you have any concerns apart from a few mentioned above, which I have resolved recently. Not sure if you had a chance to review it completely yet. Would you be able to advise on the best path forward?

@dsmiley
Copy link
Contributor

dsmiley commented Oct 15, 2025

Sorry; this PR deserves more attention but I've been prioritizing my limited time to Solr 10. I'll return to this as soon as I can... perhaps tonight. I said that to myself yesterday so it's at least at the top of my discretionary list.

*/
@Override
@SuppressWarnings("unchecked")
public void process(ResponseBuilder rb) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this component forces distributed mode, why would process() be called?

Copy link
Author

@ercsonusharma ercsonusharma Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With reference, In the case of distributed mode, an HTTP request is made with rb.distrib=false through ShardHandlerRequest in one of the phases. The process method would be called for that particular shard query request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an isShard=true request then. Okay; but then why wouldn't the logic just pass-through as normal? It's not clear why CombinedQueryComponent has any logic to accomplish at this phase. "Way 2" is only at aggregation (disbtrib=true), in my head anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Author

@ercsonusharma ercsonusharma Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, took longer to reply.
The forced distributed would anyway enable the shard call (isShard=true).

then why wouldn't the logic just pass-through as normal

The process() method is directly communicating with the Searcher as we need to process each query at the shard searcher level. If you would look at the method override, it is also creating the combined docSet across queries other than few other metadata, required for merging the docs finally across the shard at the coordinator level.

However, this Searcher call per query can be further parallelised as added in comment.
@dsmiley

protected Map<Object, ShardDoc> createShardResult(
ResponseBuilder rb,
Map<String, List<ShardDoc>> shardDocMap,
SolrDocumentList responseDocs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unfortunate to have a method that both returns something and also manipulates an argument. At least you documented it.

@ercsonusharma
Copy link
Author

@dsmiley Was I able to address your concern through above changes and responses? Please let me know if there is more.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my fundamental concerns are addressed. I left some comments. I'd prefer that the modifications, especially relating to forceDistrib, be in a separate PR and having a basic test for forceDistrib in particular. Like just test with a dummy TestForceDistribComponent if process() is called, that distrib=false is set. Maybe it could be added to an existing distribution test in standalone and another in a distrib SolrCloud.

@dsmiley
Copy link
Contributor

dsmiley commented Nov 4, 2025

I'll handle the forceDistrib matter; it was my idea after all so I'll see to creating a test.

@ercsonusharma
Copy link
Author

ercsonusharma commented Nov 4, 2025

I think my fundamental concerns are addressed. I left some comments.

Thank you, @dsmiley for the extensive review. I have addressed those comments as well.
I can add a basic test for forceDistrib for now if you wish and all other modifications and improvements can be carried out in a separate PR.

UPDATE: @dsmiley - Have added the test cases for your convenience. Kindly have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:search client:solrj documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants