-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Use FixedBitSet#cardinality for counting liveDocs in CheckIndex #15045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I have an in-progress PR that would break this optimization: #14996. Furthermore, in the typical case, live docs are not as instance of FixedBitSet
but of FixedBits
(the result of FixedBitSet#asReadOnlyBits
) so I don't think it would help much?
Thanks for the review Adrien, sorry for not making it clear, this change also use I had considered another approach: placing the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. I'm not not too fond of the approach, it looks like you'd really want to add int Bits#cardinality()
, but also don't want to add it to keep Bits
lean (which I appreciate). But it looks a bit odd to me.
If we'd like to speed these things up, maybe we should allocate a FixedBitSet(1024)
, copy the content of the Bits
into this FixedBitSet
using applyMask
and then call cardinality()
on the FixedBitSet
?
It s a nice idea! although it requires allocating an Here are some JMH numbers:
Code@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 5, time = 5)
@Fork(1)
public class FixedBitSetBenchmark {
@Param({"1024"})
private int size;
@Param({"0.5"})
private float density; // the percentage of 1 in the bitset
private Bits bits; // FixedBitSet#asReadOnlyBits
Bits fallbackBits; // will not use FixedBitSet
@Setup(Level.Trial)
public void setup() {
FixedBitSet bitSet = new FixedBitSet(size);
int numSet = (int) (size * density);
if (numSet == size) {
bitSet.set(0, size);
} else if (numSet > 0) {
Random random = new Random(0);
for (int i = 0; i < numSet; i++) {
bitSet.set(random.nextInt(size - 1));
}
}
bits = bitSet.asReadOnlyBits();
fallbackBits =
new Bits() {
@Override
public boolean get(int index) {
return index % 2 == 0;
}
@Override
public int length() {
return size;
}
};
}
@Benchmark
public void countWithCardinality(Blackhole bh) {
int count = 0;
FixedBitSet bitSet = new FixedBitSet(size);
bitSet.set(0, size);
bits.applyMask(bitSet, 0);
count = bitSet.cardinality();
bh.consume(count);
}
@Benchmark
public void countWithFixedBitSetGet(Blackhole bh) {
int count = 0;
for (int i = 0; i < bits.length(); i++) {
if (bits.get(i)) {
count++;
}
}
bh.consume(count);
}
@Benchmark
public void countWithFallbackGet(Blackhole bh) {
int count = 0;
for (int i = 0; i < fallbackBits.length(); i++) {
if (fallbackBits.get(i)) {
count++;
}
}
bh.consume(count);
}
} |
We don't actually need to allocate a FixedBitSet of size maxDoc, we could copy slices of 1024 bits into a FixedBitSet(1024) to do the counting? |
No problem. I will update it. |
The new approach is similar to #14998, so I reused part of the code. The changes touch The optimization is now applied only to |
This uses
FixedBitSet#cardinality
to speed up counting liveDocs in CheckIndex and some assert implementations, instead of checking bits one by one.