-
Notifications
You must be signed in to change notification settings - Fork 176
fix: fancy indexing fixes backed h5py error #2066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fancy indexing fixes backed h5py error #2066
Conversation
Codecov Report❌ Patch coverage is
❌ Your project check has failed because the head coverage (66.43%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## main #2066 +/- ##
===========================================
- Coverage 85.57% 66.43% -19.14%
===========================================
Files 46 46
Lines 7092 7118 +26
===========================================
- Hits 6069 4729 -1340
- Misses 1023 2389 +1366
|
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this looks good, thank you!
I have a lot of little comments. Please tell me if you prefer that I do all this, it’s fine with me!
for more information, see https://pre-commit.ci
|
I took a shot at it. Let me know what you think @flying-sheep ! Thanks |
|
It seems possible that my first run just got very very unlucky, as unlikely as that sounds. I may add some retry-averaging to the peak memory benchmarks if that is okay with you... |
|
@sjfleming I'm not sure about the memory issue - we run these benchmarks semi-frequently and I haven't seen that much variability. Thanks a million for these benchmarks by the way! I would say the biggest concern with benchmarks (apologies for not clarifying this) was tanking performance for what was a previously valid indexing operation (sorted, no repeats). As long as that is fine, or at least the penalty minimized, that would make me happy :) |
|
No changes in benchmarks. Comparison: https://github.com/scverse/anndata/compare/2fe7b3968f89eff4569afa10df216f8c14d8264b..655c3ba962d622a90125dc7595105334d7691371 More details: https://github.com/scverse/anndata/pull/2066/checks?check_run_id=53370096184 |
|
@sjfleming So the benchmarks literally crashed here, not sure why! Since the becnhmark took 174 minutes to run, could you either (a) enusre things run locally first but not in 174 minutes or (b) let me know that you want to use the CI for sanity checks and then ping me when you want the "real" benchmarks run on our machine? Then I'll remove the label and put it back when ready. Thanks a bunch! UPDATE: https://github.com/scverse/anndata/pull/2121/checks?check_run_id=53271517569 wow ok something is up with the machine, I think, different benchmarks are also all of a sudden taking forever |
|
@ilan-gold okay so the benchmarks I messed with in my attempt to "improve stability" 9734678 are now failing to return a value, so that was ill-conceived. (It worked locally...) There are new benchmarks I've written in I can run the full benchmarking suite locally with But it looks like I've added too much to the benchmarking. I'm going to cut down on the number of new ones so as not to bog down the benchmarking suite. I think I went way overboard. |
|
I've pared down the new benchmarks so that they run in 2 additional minutes. |
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
I've run benchmarking on my local machine using ( @ilan-gold I think it's ready to be run on your CI. |
Right so my concern is really mostly what we have on main right now i.e., you pass in a string array, and now pay some performance penalty for, at least, finding out if it's sorted/unique. And it looks like from what you have, to be reasonable. The benchmark machine reutrned a job for me in 45 minutes yesterday as opposed to 3 hours, so hopefully whatever was happening is resolved. I'll have yours running before the afternoon here - if I forget, feel free to ping. |
|
Thanks for sticking with this @sjfleming ! |
|
@sjfleming I actually just noticed something: https://github.com/scverse/anndata/pull/2066/files#diff-4e5148021dbb054cace3f9749e52227b27fa61acf1137d979f90d148b82555c3R15-R21 only contains sparse benchmarking but this indexing is for dense, no? If that is true, I would revert this PR (#2163) and then we would benchmark against main again with the fixed benchmarks. But maybe I'm missing something. Alternatively, we could do this manually locally, |
This reverts commit 7c52d36.
|
I will handle the benchmarks. I think I'm more irked about the time it takes (which is probably unrelated to this PR). Thanks again, @sjfleming, sorry the panicked response. |
…h5py error) (#2162) Co-authored-by: Stephen Fleming <[email protected]>
|
Awesome thanks @ilan-gold ! Yeah it looks like the benchmarks should probably be updated to include dense and sparse. |
This is a first stab at fixing #2064 by adding
_safe_fancy_index_h5py(and 3 related helper functions) toanndata/_core/index.py. The function_safe_fancy_index_h5pyonly gets called in the case where there are repeated indices being requested (this is the only case that is currently causing a bug, so in all other cases, the existing code --d[tuple(ordered)][tuple(rev_order)]-- is what runs).