Skip to content

Conversation

HUSTERGS
Copy link
Contributor

Description

Like #14023, this PR propose to wrap all iterators (not just the lead) with ScorerUtil::likelyImpactsEnum, it seems to be helpful with ScorerUtil.applyRequiredClause (I guess).
As before, I ran luceneutil on wikimediumall with searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50, result after 20 iterations are shown below:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      OrHighHigh       21.32      (1.9%)       20.70     (13.5%)   -2.9% ( -17% -   12%) 0.343
                       OrHighMed       67.84      (3.6%)       66.34     (12.4%)   -2.2% ( -17% -   14%) 0.445
                 DismaxOrHighMed       50.02      (3.3%)       49.17      (8.1%)   -1.7% ( -12% -    9%) 0.379
                     OrStopWords        8.93      (1.9%)        8.81     (10.5%)   -1.3% ( -13% -   11%) 0.577
                DismaxOrHighHigh       35.32      (2.1%)       34.85      (7.6%)   -1.3% ( -10% -    8%) 0.458
                AndMedOrHighHigh       16.37      (3.9%)       16.16      (4.8%)   -1.3% (  -9% -    7%) 0.348
                IntervalsOrdered        2.43      (3.2%)        2.41      (3.9%)   -0.8% (  -7% -    6%) 0.501
                        Or3Terms       64.13      (3.8%)       63.74     (10.2%)   -0.6% ( -14% -   13%) 0.803
                     CountPhrase        2.63      (4.2%)        2.62      (4.4%)   -0.6% (  -8% -    8%) 0.673
                     AndHighHigh       21.66      (9.0%)       21.58     (13.0%)   -0.4% ( -20% -   23%) 0.921
                 CountAndHighMed       74.71      (2.8%)       74.58      (2.3%)   -0.2% (  -5% -    5%) 0.837
             CombinedAndHighHigh        5.69      (2.6%)        5.68      (1.7%)   -0.1% (  -4% -    4%) 0.839
                      OrHighRare       95.49      (4.7%)       95.39      (5.2%)   -0.1% (  -9% -   10%) 0.946
                    SloppyPhrase        1.11      (4.8%)        1.11      (5.4%)   -0.0% (  -9% -   10%) 0.994
                      AndHighMed       52.26      (8.6%)       52.32     (12.0%)    0.1% ( -18% -   22%) 0.970
                      DismaxTerm      493.97      (5.6%)      494.70      (6.5%)    0.1% ( -11% -   12%) 0.939
                            Term      455.45      (6.8%)      456.19      (8.7%)    0.2% ( -14% -   16%) 0.947
                  CountOrHighMed       77.56      (2.6%)       77.69      (1.9%)    0.2% (  -4% -    4%) 0.820
         CountFilteredOrHighHigh       15.73      (0.9%)       15.77      (0.8%)    0.2% (  -1% -    1%) 0.443
                         Prefix3       73.94      (3.9%)       74.13      (4.0%)    0.3% (  -7% -    8%) 0.839
          CountFilteredOrHighMed       17.81      (0.8%)       17.86      (0.7%)    0.3% (  -1% -    1%) 0.277
                          Phrase        7.50      (3.2%)        7.52      (2.5%)    0.3% (  -5% -    6%) 0.774
              Or2Terms2StopWords       61.00      (5.8%)       61.19      (9.1%)    0.3% ( -13% -   16%) 0.900
                         TermB1M      454.62      (6.7%)      456.15      (8.6%)    0.3% ( -14% -   16%) 0.890
             FilteredOrStopWords        8.09      (1.8%)        8.12      (1.7%)    0.4% (  -3% -    3%) 0.510
                          IntSet      295.46      (4.5%)      296.57      (4.3%)    0.4% (  -8% -    9%) 0.789
                         Term100      454.42      (6.7%)      456.17      (8.7%)    0.4% ( -14% -   16%) 0.875
                          OrMany        4.63      (5.2%)        4.65      (6.5%)    0.4% ( -10% -   12%) 0.834
                       TermB1M1P      454.90      (6.9%)      456.70      (8.8%)    0.4% ( -14% -   17%) 0.874
             CountFilteredPhrase        9.05      (3.2%)        9.09      (2.0%)    0.4% (  -4% -    5%) 0.632
                CountAndHighHigh       48.41      (2.2%)       48.61      (1.9%)    0.4% (  -3% -    4%) 0.515
                          Term1M      454.46      (6.7%)      456.41      (8.9%)    0.4% ( -14% -   17%) 0.864
                 CountOrHighHigh       49.92      (2.5%)       50.15      (2.0%)    0.5% (  -3% -    5%) 0.517
                 AndHighOrMedMed       14.07      (3.4%)       14.14      (3.2%)    0.5% (  -5% -    7%) 0.648
             CountFilteredIntNRQ       16.29      (1.4%)       16.37      (1.0%)    0.5% (  -1% -    2%) 0.188
                         Respell       36.65      (2.9%)       36.86      (2.8%)    0.5% (  -5% -    6%) 0.545
                         Term10K      453.90      (6.7%)      456.41      (8.9%)    0.6% ( -14% -   17%) 0.823
                  FilteredPhrase        9.68      (2.8%)        9.73      (2.3%)    0.6% (  -4% -    5%) 0.473
                        SpanNear        2.46      (4.8%)        2.48      (4.2%)    0.6% (  -7% -   10%) 0.677
                 FilteredPrefix3       69.05      (3.8%)       69.47      (3.5%)    0.6% (  -6% -    8%) 0.600
                  FilteredOrMany        3.98      (2.0%)        4.00      (2.2%)    0.6% (  -3% -    4%) 0.350
                          Fuzzy1       39.99      (3.8%)       40.26      (3.7%)    0.7% (  -6% -    8%) 0.574
                        Wildcard       46.83      (3.1%)       47.16      (2.8%)    0.7% (  -5% -    6%) 0.448
              FilteredOrHighHigh       12.88      (2.5%)       12.97      (1.7%)    0.7% (  -3% -    5%) 0.297
               TermDayOfYearSort      260.15      (2.3%)      262.15      (2.1%)    0.8% (  -3% -    5%) 0.276
                  FilteredIntNRQ       42.41      (3.3%)       42.74      (2.4%)    0.8% (  -4% -    6%) 0.390
                     CountOrMany        4.98      (3.7%)        5.02      (3.3%)    0.8% (  -5% -    8%) 0.471
                          IntNRQ       42.73      (3.2%)       43.10      (2.5%)    0.9% (  -4% -    6%) 0.345
                          Fuzzy2       36.05      (3.5%)       36.36      (3.0%)    0.9% (  -5% -    7%) 0.407
                      TermDTSort      135.02      (2.5%)      136.28      (2.1%)    0.9% (  -3% -    5%) 0.203
             CountFilteredOrMany        4.39      (2.8%)        4.43      (2.1%)    1.0% (  -3% -    6%) 0.225
                       And3Terms       70.30      (7.1%)       71.03      (9.8%)    1.0% ( -14% -   19%) 0.701
               FilteredOrHighMed       38.48      (3.3%)       38.94      (2.4%)    1.2% (  -4% -    7%) 0.194
              CombinedOrHighHigh        5.58      (4.3%)        5.64      (3.1%)    1.2% (  -5% -    9%) 0.307
                       CountTerm     5742.44      (5.3%)     5814.16      (4.5%)    1.2% (  -8% -   11%) 0.422
                FilteredOr3Terms       43.18      (3.2%)       43.73      (2.4%)    1.3% (  -4% -    7%) 0.159
                   TermTitleSort       50.08      (5.9%)       50.72      (5.0%)    1.3% (  -9% -   12%) 0.463
              CombinedAndHighMed       21.29      (4.9%)       21.59      (3.3%)    1.4% (  -6% -   10%) 0.287
                    FilteredTerm       63.73      (3.8%)       64.72      (2.8%)    1.5% (  -4% -    8%) 0.142
                   TermMonthSort     2060.78      (3.6%)     2092.88      (3.5%)    1.6% (  -5% -    8%) 0.164
      FilteredOr2Terms2StopWords       49.37      (4.3%)       50.16      (3.2%)    1.6% (  -5% -    9%) 0.188
                    AndStopWords        8.60      (6.3%)        8.74      (9.9%)    1.6% ( -13% -   18%) 0.540
              FilteredAndHighMed       31.03      (3.2%)       31.54      (3.8%)    1.6% (  -5% -    8%) 0.137
             And2Terms2StopWords       58.34      (7.6%)       59.46      (8.5%)    1.9% ( -13% -   19%) 0.449
               CombinedOrHighMed       20.91      (5.7%)       21.36      (4.3%)    2.2% (  -7% -   12%) 0.170
                    CombinedTerm       11.07      (4.5%)       11.33      (3.2%)    2.4% (  -5% -   10%) 0.053
     FilteredAnd2Terms2StopWords       58.94      (5.0%)       60.51      (5.3%)    2.7% (  -7% -   13%) 0.101
            FilteredAndStopWords        8.30      (3.1%)        8.57      (1.8%)    3.3% (  -1% -    8%) 0.000
             FilteredAndHighHigh       10.23      (3.0%)       10.58      (2.0%)    3.4% (  -1% -    8%) 0.000
               FilteredAnd3Terms      100.40      (2.7%)      103.97      (2.3%)    3.6% (  -1% -    8%) 0.000

What I'm curious about is that many Or type tasks clustered at the top, which might not be coincident ?

Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot added this to the 10.3.0 milestone Jul 29, 2025
Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you!

@HUSTERGS HUSTERGS merged commit 7a60d7c into apache:main Aug 12, 2025
8 checks passed
@HUSTERGS
Copy link
Contributor Author

Nightly benchmark shows a small speedup for FilteredAndXXX task (https://benchmarks.mikemccandless.com/2025.08.13.20.38.43.html), but there is also an overall slowdown, it should be caused by the pollute operations newly added in luceneutil mikemccand/luceneutil#436

akhilesh-k pushed a commit to akhilesh-k/lucene that referenced this pull request Aug 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants