Skip to content

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jul 28, 2025

Lucene has call sites of Bits#get that are more polymorphic than they should because they sometimes receive a FixedBitSet as Bits, and sometimes a FixedBits (the return type of FixedBitSet#asReadOnlyBits). This makes some call sites bimorphic when they could be monomorphic, of polymorphic when they could be bimorphic (e.g. if SparseFixedBitSet also gets used).

Given how polymorphism affects performance, I would like to fix this performance trap. One idea would be to make BitSet no longer implement Bits, ie. you could no longer pass a FixedBitSet when Bits are expected, you would need to pass the result of FixedBitSet#asReadOnlyBits. This is what this PR does.

I had to move/copy some APIs around between Bits, BitSet and FixedBitSet to make it work, but it's not as invasive as I initially worried.

The API change is a 11.0 change, but we could backport the bits that reduce polymorphism to 10.x, ie. making sure to pass the result of FixedBitSet#asReadOnlyBits rather than the FixedBitSet directly when Bits are expected.

Lucene has call sites of `Bits#get` that are more polymorphic than they should
because they sometimes receive a `FixedBitSet` as `Bits`, and sometimes a
`FixedBits` (the return type of `FixedBitSet#asReadOnlyBits`). This makes some
call sites bimorphic when they could be monomorphic, of polymorphic when they
could be bimorphic (e.g. if `SparseFixedBitSet` also get used).

Given how polymorphism affects performance, I would like to fix this
performance trap. One idea would be to make `BitSet` no longer implement
`Bits`, ie. you could no longer pass a `FixedBitSet` when `Bits` are expected,
you would need to pass the result of `FixedBitSet#asReadOnlyBits`. This is what
this PR does.

I had to move/copy some APIs around between `Bits`, `BitSet` and `FixedBitSet`
to make it work, but it's not as invasive as I initially worried.

The API change is a 11.0 change, but we could backport the bits that reduce
polymorphism to 10.x, ie. making sure to pass the result of
`FixedBitSet#asReadOnlyBits` rather than the `FixedBitSet` directly when `Bits`
are expected.
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@jpountz jpountz added this to the 11.0.0 milestone Jul 28, 2025
@jpountz
Copy link
Contributor Author

jpountz commented Jul 28, 2025

@msokolov @benwtrent You may want to take a look since vector search is probably the most performance-critical code that is touched by this change due to AbstractKnnVectorQuery, which passes the following implementing classes for the Bits acceptDocs:

  • FixedBits for unfiltered searches,
  • FixedBitSet for dense filters,
  • SparseFixedBitSet for sparse filters.

With this PR, dense filters and unfiltered searches both use a FixedBits instance, so call sites of Bits#get within the vector search code become bimorphic. I don't expect nightly benchmarks to see a difference though since we don't run queries that are filtered by sparse filters.

Copy link
Contributor

@msokolov msokolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I left a few small comments/questions

private final int numDocs;
private final CacheHelper readerCacheHelper;

private SoftDeletesFilterLeafReader(LeafReader reader, FixedBitSet bits, int numDocs) {
private SoftDeletesFilterLeafReader(LeafReader reader, Bits bits, int numDocs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we are changing these signatures - it seems the call sites still always use FixedBitSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Bits are indeed always created bia FixedBitSet#asReadOnlyBits but I liked doing it on the call site better than in this constructor. Since it's a private constructor, it doesn't break users.

}
if (length < bitSet.length()
&& bitSet.nextSetBit(Math.max(0, length)) != DocIdSetIterator.NO_MORE_DOCS) {
throw new IllegalArgumentException("Some bits are set beyond the end of live docs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment refers to "live docs" but the method is generic and could apply to other kinds of bit set applications - maybe change to "beyond the end of the bit set"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll fix.

Comment on lines 218 to 219
approximateSearch(
ctx, acceptDocs.asReadOnlyBits(), cost + 1, timeLimitingKnnCollectorManager);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the HNSW format, we do this:

 if (acceptDocs instanceof BitSet bitSet) {
      // Use approximate cardinality as this is good enough, but ensure we don't exceed the graph
      // size as that is illogical
      filteredDocCount = Math.min(bitSet.approximateCardinality(), graph.size());
      if (unfilteredVisit >= filteredDocCount) {
        doHnsw = false;
      }
    }

We determine the filter size via casting the Bits to BitSet. So, I am not sure we can make this change without progressing: #15011

Or bits should give cardinality information...

Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Aug 19, 2025
@github-actions github-actions bot removed the Stale label Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants