Skip to content

feat(search_family): Speed up merging of index results #5545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BagritsevichStepan
Copy link
Contributor

@BagritsevichStepan BagritsevichStepan commented Jul 22, 2025

Add block-level jumps to the BlockList, as well as power-of-two-length jumps based for base vector iterators.
This optimization speeds up merging for all index types, not just numeric indexes.

Before:

Benchmark Time (ns) CPU (ns) Iterations
BM_SearchNumericAndTagIndexes/num_docs:10000 12752 12751 5392
BM_SearchNumericAndTagIndexes/num_docs:100000 1407975 1407833 497
BM_SearchNumericAndTagIndexes/num_docs:1000000 11055515 11054017 64
BM_SearchSeveralNumericAndTagIndexes/num_docs:10000 154778 154766 4535
BM_SearchSeveralNumericAndTagIndexes/num_docs:100000 2084997 2084989 335
BM_SearchSeveralNumericAndTagIndexes/num_docs:1000000 40143853 40141445 17

After:

Benchmark Time (ns) CPU (ns) Iterations
BM_SearchNumericAndTagIndexes/num_docs:10000 19200 19197 35978
BM_SearchNumericAndTagIndexes/num_docs:100000 35147 35141 19515
BM_SearchNumericAndTagIndexes/num_docs:1000000 10574950 10571685 65
BM_SearchSeveralNumericAndTagIndexes/num_docs:10000 32930 32926 21365
BM_SearchSeveralNumericAndTagIndexes/num_docs:100000 37653 37652 18845
BM_SearchSeveralNumericAndTagIndexes/num_docs:1000000 31691981 31686310 23

}
}

while (*it != end && less_than_min_doc_id(**it)) {
Copy link
Contributor Author

@BagritsevichStepan BagritsevichStepan Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why it does not work without this while (it must work with single check). I think there is a bug somewhere

operator++();
} while (it != it_end && (block_it == block_end || *block_it < min_doc_id));
} else {
if (it == it_end) {
Copy link
Contributor Author

@BagritsevichStepan BagritsevichStepan Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here it does not work without this check
I need to understand why

Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SeekGE actually became an unconditionall call instead of just a hint.

I thought that we'll compute a flag (like try_seek = size(r) * 5 < size(l)) to see if it makes sense to try to seek. Like

if (try_seek)
  l.SeekGe(); // Doesn't have to seek to latest entry
while (l < r) ++l;

This has the following benefits:

  • SeekGe becomes simpler because you don't have to guarantee seeking to the last entry
  • You don't try to jump each time on equally large sets and on the small set

Comment on lines +156 to +159
if constexpr (std::is_same_v<C, CompressedSortedSet>) {
do {
operator++();
} while (it != it_end && (block_it == block_end || *block_it < min_doc_id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you jump blocks for a compressed set? It's the same

Comment on lines +143 to +152
size_t length = std::distance(*it, end);
for (size_t step = details::GetHighestPowerOfTwo(length); step > 0; step >>= 1) {
if (step < length) {
auto next_it = *it + step;
if (less_than_min_doc_id(*next_it)) {
*it = next_it;
length -= step;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questionable. You spent so much effort hyper optimizing merging to call this unconditionally. What will the performance be with equally sized sets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants