-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(search_family): Speed up merging of index results #5545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(search_family): Speed up merging of index results #5545
Conversation
Signed-off-by: Stepan Bagritsevich <[email protected]>
} | ||
} | ||
|
||
while (*it != end && less_than_min_doc_id(**it)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why it does not work without this while (it must work with single check). I think there is a bug somewhere
operator++(); | ||
} while (it != it_end && (block_it == block_end || *block_it < min_doc_id)); | ||
} else { | ||
if (it == it_end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here it does not work without this check
I need to understand why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SeekGE
actually became an unconditionall call instead of just a hint.
I thought that we'll compute a flag (like try_seek = size(r) * 5 < size(l)) to see if it makes sense to try to seek. Like
if (try_seek)
l.SeekGe(); // Doesn't have to seek to latest entry
while (l < r) ++l;
This has the following benefits:
- SeekGe becomes simpler because you don't have to guarantee seeking to the last entry
- You don't try to jump each time on equally large sets and on the small set
if constexpr (std::is_same_v<C, CompressedSortedSet>) { | ||
do { | ||
operator++(); | ||
} while (it != it_end && (block_it == block_end || *block_it < min_doc_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you jump blocks for a compressed set? It's the same
size_t length = std::distance(*it, end); | ||
for (size_t step = details::GetHighestPowerOfTwo(length); step > 0; step >>= 1) { | ||
if (step < length) { | ||
auto next_it = *it + step; | ||
if (less_than_min_doc_id(*next_it)) { | ||
*it = next_it; | ||
length -= step; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questionable. You spent so much effort hyper optimizing merging to call this unconditionally. What will the performance be with equally sized sets?
Add block-level jumps to the BlockList, as well as power-of-two-length jumps based for base vector iterators.
This optimization speeds up merging for all index types, not just numeric indexes.
Before:
After: