-
Notifications
You must be signed in to change notification settings - Fork 77
Implement efficient seeking from non-null trees using tree_pos #2911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2911 +/- ##
==========================================
+ Coverage 89.61% 89.62% +0.01%
==========================================
Files 28 28
Lines 31983 32058 +75
Branches 5888 5903 +15
==========================================
+ Hits 28660 28731 +71
- Misses 1888 1889 +1
- Partials 1435 1438 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
That is interesting. I can see the LD calculator causing problems as that does a lot of seeking backward and forward (was the original motivation for bidirectional seeking). I don't see anything obvious wrong, but it must be something to do with sample counts. One thing that we can at least make progress on is that I think we should keep the option of seeking linearly. So, we make a new option to We should specify this option in the ld_calculator, as that is definitely somewhere that linear seeking makes sense. |
Okay; that makes sense to me. I'll revert |
seek_skip might be a bit more descriptive? |
Something to bear in mind here is that sample_lists might not be compatible with this when seeking around randomly because of the non-linear order that edges can get inserted. A straightforward approach to getting this merged then could be to simply require that TSK_SEEK_LINEAR is always used with sample lists. |
Note this is affected by #3245. Perhaps worth picking up to see if the problem is indeed with sample lists? |
I'm going to take a quick look at this to see what's happening |
35dcda4
to
2f1cca1
Compare
ef91f3d
to
4e1e96d
Compare
I think this is ready for a more general review and merging. I'm reasonably sure that the algorithms are working now and everything is safe, but it's still not clear to me if we should make |
46cc619
to
7144ab0
Compare
7144ab0
to
002b006
Compare
Sounds good to me. |
I think we need your review and approval then @benjeffery to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Couple of small things
@@ -6549,7 +6635,7 @@ tsk_tree_seek_index(tsk_tree_t *self, tsk_id_t tree, tsk_flags_t options) | |||
} | |||
|
|||
static int TSK_WARN_UNUSED | |||
tsk_tree_seek_linear(tsk_tree_t *self, double x, tsk_flags_t TSK_UNUSED(options)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the unused flags are being removed? Won't ever be part of the public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes more sense when you look at where we call tsk_tree_seek_skip or tsk_tree_seek_linear below by looking at options
. Doesn't really matter ultimately as it's a private function.
Co-authored-by: Ben Jeffery <[email protected]>
Description
Continuing from #2874, we want to finish moving over the tree-positioning code to use
tree_pos
efficiently. At the moment,tsk.tree_seek
will either calltsk_tree_seek_from_null
ortsk_tree_seek_linear
depending on whether we are starting from the null tree or not.seek_linear
repeatedly callsnext
orprev
until it reaches the given position, with the direction being determined by which would cover the shortest distance.As a first pass, I've implemented
tsk_tree_seek_forward
andtsk_tree_seek_backward
and I've incorporated them intotsk_tree_seek_linear
. We will need to revise some of thetest_highlevel.py
seek tests, because the direction we choose to seek along is different to the old approach in some cases. For example, we now seek forward to go from the first to the last tree in a sequence.Curiously, my implementation passes all the C tests with no memory issues detected by Valgrind, and it also passes all the
test_highlevel.py
andtest_tree_positioning
tests except for the ones dependent on seeking direction. However, it has caused chaos with other Python tests, causing failures and segfaults intest_stats.py
andtest_divmat.py
among others. The failing/crashing tests seem to be primarily be associated with LD calculations and divergence.I'm currently trying to determine whether the problems are due to an error in my implemention (most likely) or the subtle problems with the time ordering of inserted edges, discussed in #2792.
PR Checklist: