-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Filter by maxOtherDoc
first to reduce the overhead of filterCompetitiveHits
#15081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
I did some experiment, if I only filter docs when |
Since I'm little bit confused by the cause of performance gain, I re-run the util to double-check it, here is the second benchmark result:
I suspect that pre-filter doc can help because it reduces some useless advance of some clauses, since they now have some kind of global view (docs filtered by |
This is interesting, this suggests that we're scoring too many docs with the leading clause. With your change, we're skipping these docs after scoring them with the first clause, but we should be able to skip these docs before scoring them as well? With something like that: diff --git a/lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java b/lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java
index 21f8af990b8..c2030dbc8fb 100644
--- a/lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java
+++ b/lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java
@@ -178,6 +178,7 @@ final class BlockMaxConjunctionBulkScorer extends BulkScorer {
docAndScoreAccBuffer.copyFrom(docAndScoreBuffer);
+ int maxOtherDoc = -1;
for (int i = 1; i < scorers.length; ++i) {
double sumOfOtherClause = sumOfOtherClauses[i];
if (sumOfOtherClause != sumOfOtherClauses[i - 1]) {
@@ -188,20 +189,17 @@ final class BlockMaxConjunctionBulkScorer extends BulkScorer {
}
ScorerUtil.applyRequiredClause(docAndScoreAccBuffer, iterators[i], scorables[i]);
+ maxOtherDoc = Math.max(iterators[i].docID(), maxOtherDoc);
}
for (int i = 0; i < docAndScoreAccBuffer.size; ++i) {
scorable.score = (float) docAndScoreAccBuffer.scores[i];
collector.collect(docAndScoreAccBuffer.docs[i]);
}
- }
- int maxOtherDoc = -1;
- for (int i = 1; i < iterators.length; ++i) {
- maxOtherDoc = Math.max(iterators[i].docID(), maxOtherDoc);
- }
- if (lead.docID() < maxOtherDoc) {
- lead.advance(maxOtherDoc);
+ if (lead.docID() < maxOtherDoc) {
+ lead.advance(maxOtherDoc);
+ }
}
}
|
Yeah, I remember I tried something like that before, didn't workout well, I'll run the luceneutil to verify again. |
I'll run the luceneutil to verify again. Here is the result, based on https://github.com/HUSTERGS/lucene/tree/bench_advance, hope I'm not get anything wrong
|
I suspect that |
Hmm, it's annoying when the most logical fix doesn't work. :) I'm not sure if it's due to advance() being too heavy, your approach has a call to |
I ran this PR on my machine, it seems to trigger a small slowdown on combined tasks (p-value is low) and it's not clear if there's any speedup:
|
Actually I did another experiment, I was worried about the
BTW, This benchmark contains #15039, which is different from previous benchmark setup. The affected tasks also changed, e.g. OrHighHigh no longer got a speedup. I'd push the newest code latter, maybe you can help run the benchmark to verify it ? |
Maybe we can call other iterator's |
Didn't work |
Since both @jpountz and my last benchmark results showed a slowdown on Combined tasks, I did a micro benchmark locally (not commit with this pr) and change the way
|
Description
This PR propose to filter the
docAndScoreBuffer
bymaxOtherDoc
, so the cost followingfilterCompetitiveHits
might be reduced since the elements are less. Here is the result under wikimediumall after 20 iterations: