fix: fancy indexing fixes backed h5py error #2066

sjfleming · 2025-07-30T15:45:20Z

Closes duplicated indices when slicing dense backed view lead to .to_memory() TypeError #2064
Tests added
Release note added (or unnecessary)

This is a first stab at fixing #2064 by adding _safe_fancy_index_h5py (and 3 related helper functions) to anndata/_core/index.py. The function _safe_fancy_index_h5py only gets called in the case where there are repeated indices being requested (this is the only case that is currently causing a bug, so in all other cases, the existing code -- d[tuple(ordered)][tuple(rev_order)] -- is what runs).

for more information, see https://pre-commit.ci

codecov · 2025-07-30T15:47:38Z

Codecov Report

❌ Patch coverage is 35.84906% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.43%. Comparing base (a1d6f17) to head (64af43e).

Files with missing lines	Patch %	Lines
src/anndata/_core/index.py	27.65%	34 Missing ⚠️

❌ Your project check has failed because the head coverage (66.43%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (a1d6f17) and HEAD (64af43e). Click for more details.

HEAD has 2 uploads less than BASE

Flag BASE (a1d6f17) HEAD (64af43e)

5 3

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #2066       +/-   ##
===========================================
- Coverage   85.57%   66.43%   -19.14%     
===========================================
  Files          46       46               
  Lines        7092     7118       +26     
===========================================
- Hits         6069     4729     -1340     
- Misses       1023     2389     +1366

Files with missing lines	Coverage Δ
src/anndata/_core/merge.py	`63.48% <100.00%> (-20.93%)`	⬇️
src/anndata/_core/sparse_dataset.py	`83.28% <100.00%> (-9.39%)`	⬇️
src/anndata/experimental/backed/_lazy_arrays.py	`83.19% <100.00%> (-8.41%)`	⬇️
src/anndata/_core/index.py	`60.09% <27.65%> (-32.56%)`	⬇️

... and 24 files with indirect coverage changes

for more information, see https://pre-commit.ci

flying-sheep

Hi, this looks good, thank you!

I have a lot of little comments. Please tell me if you prefer that I do all this, it’s fine with me!

src/anndata/_core/index.py

tests/test_backed_hdf5.py

for more information, see https://pre-commit.ci

sjfleming · 2025-08-01T21:30:47Z

I took a shot at it. Let me know what you think @flying-sheep ! Thanks

sjfleming · 2025-10-21T05:30:59Z

It seems possible that my first run just got very very unlucky, as unlikely as that sounds.

I may add some retry-averaging to the peak memory benchmarks if that is okay with you...

ilan-gold · 2025-10-21T11:09:34Z

@sjfleming I'm not sure about the memory issue - we run these benchmarks semi-frequently and I haven't seen that much variability. Thanks a million for these benchmarks by the way!

I would say the biggest concern with benchmarks (apologies for not clarifying this) was tanking performance for what was a previously valid indexing operation (sorted, no repeats). As long as that is fine, or at least the penalty minimized, that would make me happy :)

scverse-benchmark · 2025-10-21T13:57:43Z

No changes in benchmarks.

Comparison: https://github.com/scverse/anndata/compare/2fe7b3968f89eff4569afa10df216f8c14d8264b..655c3ba962d622a90125dc7595105334d7691371
Last changed: Wed, 22 Oct 2025 12:06:33 +0000

More details: https://github.com/scverse/anndata/pull/2066/checks?check_run_id=53370096184

ilan-gold · 2025-10-21T14:24:02Z

@sjfleming So the benchmarks literally crashed here, not sure why!

Since the becnhmark took 174 minutes to run, could you either (a) enusre things run locally first but not in 174 minutes or (b) let me know that you want to use the CI for sanity checks and then ping me when you want the "real" benchmarks run on our machine? Then I'll remove the label and put it back when ready. Thanks a bunch!

UPDATE: https://github.com/scverse/anndata/pull/2121/checks?check_run_id=53271517569 wow ok something is up with the machine, I think, different benchmarks are also all of a sudden taking forever

sjfleming · 2025-10-21T19:30:53Z

@ilan-gold okay so the benchmarks I messed with in my attempt to "improve stability" 9734678 are now failing to return a value, so that was ill-conceived. (It worked locally...)
Those are the ones with prefix readwrite.....track_peakmem... that now have all zeros. I will revert this change. It seems like I was misunderstanding how that memory calc worked: when I run on my laptop, I guess I'm seeing memory effects from a bunch of other processes. When it runs on a dedicated machine in CI, it's probably fine.

There are new benchmarks I've written in benchmarks/benchmarks/backed_hdf5.py (BackedHDF5.....timed_fancy_index...) that will fail on main without this PR, because they cover new functionality.
Maybe it's a bad idea to include things like that? Not sure what you think.

I can run the full benchmarking suite locally with asv run --python same. The benchmarks I've added take about 22 mins to run. The full run of all benchmarks (including those new ones) completes in 25 mins and everything looks okay. Things also look good compared to main if I run on main and then do asv compare.

But it looks like I've added too much to the benchmarking. I'm going to cut down on the number of new ones so as not to bog down the benchmarking suite. I think I went way overboard.

sjfleming · 2025-10-21T19:33:44Z

I've pared down the new benchmarks so that they run in 2 additional minutes.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

for more information, see https://pre-commit.ci

sjfleming · 2025-10-21T21:19:56Z

I've run benchmarking on my local machine using

asv run main --steps 1
asv run 655c3ba --steps 1
asv compare main 655c3ba

(main was commit 2fe7b39)
and the output looks alright to me (barring my local machine's issues measuring peak memory):

All benchmarks:

| Change   | Before [2fe7b396] <main>   | After [655c3ba9] <sf-backed-hdf5-fancy-indexing>   |   Ratio | Benchmark (Parameter)                                                                               |
|----------|----------------------------|----------------------------------------------------|---------|-----------------------------------------------------------------------------------------------------|
|          | 1872800.0                  | 1873429.0                                          |    1    | anndata.GarbargeCollectionSuite.track_peakmem_garbage_collection                                    |
|          | 703M                       | 706M                                               |    1.01 | backed_hdf5.BackedHDF5.peakmem_fancy_index_no_dupes                                                 |
| -        | 703M                       | 592M                                               |    0.84 | backed_hdf5.BackedHDF5.peakmem_index_with_dupes_obs                                                 |
|          | 705M                       | 669M                                               |    0.95 | backed_hdf5.BackedHDF5.peakmem_slice_obs                                                            |
|          | 679M                       | 661M                                               |    0.97 | backed_hdf5.BackedHDF5.peakmem_to_memory_subset                                                     |
|          | 273±2μs                    | 272±5μs                                            |    1    | backed_hdf5.BackedHDF5.time_fancy_index_no_dupes                                                    |
|          | 9.83±0.06ms                | 10.3±0.2ms                                         |    1.05 | backed_hdf5.BackedHDF5.time_fancy_index_no_dupes_to_memory                                          |
|          | 239±2μs                    | 244±7μs                                            |    1.02 | backed_hdf5.BackedHDF5.time_index_with_dupes_obs                                                    |
|          | 337±10μs                   | 327±8μs                                            |    0.97 | backed_hdf5.BackedHDF5.time_slice_obs                                                               |
|          | 4.72±0.07ms                | 4.90±0.07ms                                        |    1.04 | backed_hdf5.BackedHDF5.time_slice_obs_to_memory                                                     |
|          | 3.98±0.07ms                | 4.38±0.5ms                                         |    1.1  | backed_hdf5.BackedHDF5.time_to_memory_subset                                                        |
|          | 233M                       | 233M                                               |    1    | dataset2d.Dataset2D.peakmem_full_to_memory(<function Dataset2D.<lambda>> (0), (-1,))                |
|          | 233M                       | 228M                                               |    0.98 | dataset2d.Dataset2D.peakmem_full_to_memory(<function Dataset2D.<lambda>> (0), None)                 |
|          | 244M                       | 247M                                               |    1.01 | dataset2d.Dataset2D.peakmem_full_to_memory(<function Dataset2D.<lambda>> (1), (-1,))                |
|          | 246M                       | 246M                                               |    1    | dataset2d.Dataset2D.peakmem_full_to_memory(<function Dataset2D.<lambda>> (1), None)                 |
|          | 231M                       | 234M                                               |    1.01 | dataset2d.Dataset2D.peakmem_getitem_bool_mask(<function Dataset2D.<lambda>> (0), (-1,))             |
|          | 260M                       | 252M                                               |    0.97 | dataset2d.Dataset2D.peakmem_getitem_bool_mask(<function Dataset2D.<lambda>> (0), None)              |
|          | 248M                       | 243M                                               |    0.98 | dataset2d.Dataset2D.peakmem_getitem_bool_mask(<function Dataset2D.<lambda>> (1), (-1,))             |
|          | 243M                       | 243M                                               |    1    | dataset2d.Dataset2D.peakmem_getitem_bool_mask(<function Dataset2D.<lambda>> (1), None)              |
|          | 229M                       | 233M                                               |    1.02 | dataset2d.Dataset2D.peakmem_getitem_slice(<function Dataset2D.<lambda>> (0), (-1,))                 |
|          | 228M                       | 234M                                               |    1.03 | dataset2d.Dataset2D.peakmem_getitem_slice(<function Dataset2D.<lambda>> (0), None)                  |
|          | 227M                       | 248M                                               |    1.09 | dataset2d.Dataset2D.peakmem_getitem_slice(<function Dataset2D.<lambda>> (1), (-1,))                 |
|          | 231M                       | 229M                                               |    0.99 | dataset2d.Dataset2D.peakmem_getitem_slice(<function Dataset2D.<lambda>> (1), None)                  |
|          | 5.33±0.3ms                 | 5.36±0.08ms                                        |    1    | dataset2d.Dataset2D.time_full_to_memory(<function Dataset2D.<lambda>> (0), (-1,))                   |
|          | 28.5±0.3ms                 | 28.6±0.5ms                                         |    1    | dataset2d.Dataset2D.time_full_to_memory(<function Dataset2D.<lambda>> (0), None)                    |
|          | 8.51±0.2ms                 | 8.38±0.4ms                                         |    0.98 | dataset2d.Dataset2D.time_full_to_memory(<function Dataset2D.<lambda>> (1), (-1,))                   |
|          | 8.65±0.2ms                 | 8.79±0.06ms                                        |    1.02 | dataset2d.Dataset2D.time_full_to_memory(<function Dataset2D.<lambda>> (1), None)                    |
|          | 50.2±0.8ms                 | 49.6±0.2ms                                         |    0.99 | dataset2d.Dataset2D.time_getitem_bool_mask(<function Dataset2D.<lambda>> (0), (-1,))                |
|          | 542±20ms                   | 528±20ms                                           |    0.97 | dataset2d.Dataset2D.time_getitem_bool_mask(<function Dataset2D.<lambda>> (0), None)                 |
|          | 20.6±0.2ms                 | 22.3±2ms                                           |    1.08 | dataset2d.Dataset2D.time_getitem_bool_mask(<function Dataset2D.<lambda>> (1), (-1,))                |
|          | 19.3±0.3ms                 | 20.5±0.5ms                                         |    1.06 | dataset2d.Dataset2D.time_getitem_bool_mask(<function Dataset2D.<lambda>> (1), None)                 |
|          | 3.70±0.2ms                 | 3.81±0.1ms                                         |    1.03 | dataset2d.Dataset2D.time_getitem_slice(<function Dataset2D.<lambda>> (0), (-1,))                    |
|          | 20.4±0.7ms                 | 20.9±1ms                                           |    1.03 | dataset2d.Dataset2D.time_getitem_slice(<function Dataset2D.<lambda>> (0), None)                     |
|          | 6.23±0.2ms                 | 6.13±0.2ms                                         |    0.98 | dataset2d.Dataset2D.time_getitem_slice(<function Dataset2D.<lambda>> (1), (-1,))                    |
| +        | 6.33±0.09ms                | 7.12±0.3ms                                         |    1.12 | dataset2d.Dataset2D.time_getitem_slice(<function Dataset2D.<lambda>> (1), None)                     |
|          | 222M                       | 223M                                               |    1.01 | readwrite.H5ADBackedWriteSuite.peakmem_write_compressed('pbmc3k')                                   |
|          | 221M                       | 215M                                               |    0.97 | readwrite.H5ADBackedWriteSuite.peakmem_write_full('pbmc3k')                                         |
|          | 371±20ms                   | 344±2ms                                            |    0.93 | readwrite.H5ADBackedWriteSuite.time_write_compressed('pbmc3k')                                      |
|          | 95.3±2ms                   | 93.5±2ms                                           |    0.98 | readwrite.H5ADBackedWriteSuite.time_write_full('pbmc3k')                                            |
| -        | 25.328125                  | 19.40625                                           |    0.77 | readwrite.H5ADBackedWriteSuite.track_peakmem_write_compressed('pbmc3k')                             |
| -        | 26.046875                  | 15.359375                                          |    0.59 | readwrite.H5ADBackedWriteSuite.track_peakmem_write_full('pbmc3k')                                   |
|          | 113563819                  | 113570669                                          |    1    | readwrite.H5ADInMemorySizeSuite.track_actual_in_memory_size('pbmc3k')                               |
|          | 24109702                   | 24109702                                           |    1    | readwrite.H5ADInMemorySizeSuite.track_in_memory_size('pbmc3k')                                      |
|          | 24.1M                      | 24.1M                                              |    1    | readwrite.H5ADReadSuite.mem_readfull_object('pbmc3k')                                               |
|          | 196M                       | 193M                                               |    0.99 | readwrite.H5ADReadSuite.peakmem_read_backed('pbmc3k')                                               |
|          | 215M                       | 218M                                               |    1.01 | readwrite.H5ADReadSuite.peakmem_read_full('pbmc3k')                                                 |
|          | 69.4±2ms                   | 67.9±1ms                                           |    0.98 | readwrite.H5ADReadSuite.time_read_full('pbmc3k')                                                    |
| +        | 1.0                        | 1.6961942257217848                                 |    1.7  | readwrite.H5ADReadSuite.track_read_full_memratio('pbmc3k')                                          |
|          | 226M                       | 237M                                               |    1.05 | readwrite.H5ADWriteSuite.peakmem_write_compressed('pbmc3k')                                         |
|          | 231M                       | 229M                                               |    0.99 | readwrite.H5ADWriteSuite.peakmem_write_full('pbmc3k')                                               |
|          | 306±9ms                    | 294±3ms                                            |    0.96 | readwrite.H5ADWriteSuite.time_write_compressed('pbmc3k')                                            |
|          | 42.1±2ms                   | 40.1±0.9ms                                         |    0.95 | readwrite.H5ADWriteSuite.time_write_full('pbmc3k')                                                  |
| +        | 7.1875                     | 9.328125                                           |    1.3  | readwrite.H5ADWriteSuite.track_peakmem_write_compressed('pbmc3k')                                   |
| -        | 12.71875                   | 8.4375                                             |    0.66 | readwrite.H5ADWriteSuite.track_peakmem_write_full('pbmc3k')                                         |
|          | 243M                       | 250M                                               |    1.03 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), '0:1000', False)            |
|          | 269M                       | 268M                                               |    1    | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), '0:1000', True)             |
|          | 258M                       | 263M                                               |    1.02 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), '0:9000', False)            |
|          | 287M                       | 285M                                               |    0.99 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), '0:9000', True)             |
|          | 250M                       | 253M                                               |    1.01 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), ':9000:-1', False)          |
|          | 264M                       | 264M                                               |    1    | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), ':9000:-1', True)           |
|          | 277M                       | 284M                                               |    1.03 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), '::-2', False)              |
|          | 281M                       | 272M                                               |    0.97 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), '::-2', True)               |
|          | 291M                       | 283M                                               |    0.97 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), 'alternating', False)       |
|          | 293M                       | 278M                                               |    0.95 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), 'alternating', True)        |
|          | 254M                       | 246M                                               |    0.97 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), 'arange', False)            |
|          | 250M                       | 259M                                               |    1.04 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), 'arange', True)             |
|          | 248M                       | 246M                                               |    0.99 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), 'array', False)             |
|          | 261M                       | 262M                                               |    1    | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), 'array', True)              |
|          | 249M                       | 249M                                               |    1    | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), 'first', False)             |
|          | 268M                       | 263M                                               |    0.98 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), 'first', True)              |
|          | 241M                       | 251M                                               |    1.04 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), '0:1000', False)      |
|          | 250M                       | 244M                                               |    0.98 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), '0:1000', True)       |
|          | 247M                       | 253M                                               |    1.02 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), '0:9000', False)      |
|          | 251M                       | 246M                                               |    0.98 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), '0:9000', True)       |
|          | 251M                       | 251M                                               |    1    | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), ':9000:-1', False)    |
|          | 251M                       | 250M                                               |    1    | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), ':9000:-1', True)     |
| +        | 230M                       | 256M                                               |    1.11 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), '::-2', False)        |
|          | 249M                       | 243M                                               |    0.98 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), '::-2', True)         |
| +        | 212M                       | 249M                                               |    1.18 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), 'alternating', False) |
|          | 250M                       | 250M                                               |    1    | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), 'alternating', True)  |
|          | 248M                       | 243M                                               |    0.98 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), 'arange', False)      |
|          | 254M                       | 251M                                               |    0.99 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), 'arange', True)       |
|          | 252M                       | 232M                                               |    0.92 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), 'array', False)       |
|          | 253M                       | 244M                                               |    0.96 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), 'array', True)        |
|          | 250M                       | 251M                                               |    1.01 | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), 'first', False)       |
| -        | 248M                       | 223M                                               |    0.9  | sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), 'first', True)        |
|          | 2.23±0.2ms                 | 2.17±0.1ms                                         |    0.97 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), '0:1000', False)               |
|          | 4.24±0.1ms                 | 4.34±0.08ms                                        |    1.02 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), '0:1000', True)                |
|          | 5.93±0.07ms                | 6.08±0.06ms                                        |    1.03 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), '0:9000', False)               |
|          | 16.4±0.6ms                 | 17.2±0.5ms                                         |    1.05 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), '0:9000', True)                |
|          | 6.84±0.2ms                 | 7.16±0.2ms                                         |    1.05 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), ':9000:-1', False)             |
|          | 4.46±0.2ms                 | 4.37±0.05ms                                        |    0.98 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), ':9000:-1', True)              |
|          | 24.7±0.9ms                 | 25.0±1ms                                           |    1.01 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), '::-2', False)                 |
|          | 17.2±0.6ms                 | 17.2±0.2ms                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), '::-2', True)                  |
|          | 30.5±1ms                   | 30.4±0.8ms                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), 'alternating', False)          |
|          | 22.2±0.6ms                 | 22.2±0.3ms                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), 'alternating', True)           |
|          | 6.16±0.07ms                | 6.22±0.2ms                                         |    1.01 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), 'arange', False)               |
|          | 4.61±0.2ms                 | 4.70±0.1ms                                         |    1.02 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), 'arange', True)                |
|          | 2.51±0.04ms                | 2.54±0.1ms                                         |    1.01 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), 'array', False)                |
|          | 6.86±0.08ms                | 7.03±0.2ms                                         |    1.03 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), 'array', True)                 |
|          | 2.00±0.03ms                | 1.99±0.04ms                                        |    0.99 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), 'first', False)                |
|          | 4.10±0.06ms                | 4.16±0.07ms                                        |    1.02 | sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), 'first', True)                 |
|          | 30.7±0.2μs                 | 30.6±0.4μs                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), '0:1000', False)         |
|          | 31.1±0.6μs                 | 30.9±0.2μs                                         |    0.99 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), '0:1000', True)          |
|          | 30.7±0.2μs                 | 30.8±0.2μs                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), '0:9000', False)         |
|          | 30.8±0.2μs                 | 30.5±0.2μs                                         |    0.99 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), '0:9000', True)          |
|          | 31.4±0.8μs                 | 30.8±0.2μs                                         |    0.98 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), ':9000:-1', False)       |
|          | 30.6±0.6μs                 | 31.0±0.5μs                                         |    1.01 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), ':9000:-1', True)        |
|          | 30.6±0.1μs                 | 30.6±0.3μs                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), '::-2', False)           |
|          | 32.0±1μs                   | 30.8±0.3μs                                         |    0.96 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), '::-2', True)            |
|          | 87.0±1μs                   | 87.7±1μs                                           |    1.01 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), 'alternating', False)    |
|          | 88.5±3μs                   | 90.0±1μs                                           |    1.02 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), 'alternating', True)     |
|          | 45.3±0.5μs                 | 45.4±0.3μs                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), 'arange', False)         |
|          | 45.5±0.4μs                 | 46.4±1μs                                           |    1.02 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), 'arange', True)          |
|          | 41.3±0.6μs                 | 41.3±0.6μs                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), 'array', False)          |
|          | 41.1±0.2μs                 | 41.3±0.7μs                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), 'array', True)           |
|          | 31.6±0.3μs                 | 31.6±0.5μs                                         |    1    | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), 'first', False)          |
|          | 31.6±0.1μs                 | 31.2±0.3μs                                         |    0.99 | sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), 'first', True)           |

@ilan-gold I think it's ready to be run on your CI.

ilan-gold · 2025-10-22T08:23:25Z

that will fail on main without this PR, because they cover new functionality. Maybe it's a bad idea to include things like that?

Right so my concern is really mostly what we have on main right now i.e., you pass in a string array, and now pay some performance penalty for, at least, finding out if it's sorted/unique. And it looks like from what you have, to be reasonable. The benchmark machine reutrned a job for me in 45 minutes yesterday as opposed to 3 hours, so hopefully whatever was happening is resolved. I'll have yours running before the afternoon here - if I forget, feel free to ping.

ilan-gold · 2025-10-22T12:34:50Z

Thanks for sticking with this @sjfleming !

ilan-gold · 2025-10-22T13:55:56Z

@sjfleming I actually just noticed something: https://github.com/scverse/anndata/pull/2066/files#diff-4e5148021dbb054cace3f9749e52227b27fa61acf1137d979f90d148b82555c3R15-R21 only contains sparse benchmarking but this indexing is for dense, no?

If that is true, I would revert this PR (#2163) and then we would benchmark against main again with the fixed benchmarks. But maybe I'm missing something.

Alternatively, we could do this manually locally, ~~but I'd kind of rather have the "clean" run on the benchmark machine. Let me know what you think.~~ I think we need a separate PR with dense benchmarks.

This reverts commit 7c52d36.

ilan-gold · 2025-10-22T14:10:22Z

I will handle the benchmarks. I think I'm more irked about the time it takes (which is probably unrelated to this PR). Thanks again, @sjfleming, sorry the panicked response.

…h5py error) (#2162) Co-authored-by: Stephen Fleming <[email protected]>

sjfleming · 2025-10-22T14:18:48Z

Awesome thanks @ilan-gold ! Yeah it looks like the benchmarks should probably be updated to include dense and sparse.

sjfleming and others added 4 commits July 30, 2025 11:36

fixes scverse#2064

7dc2af2

undo accidental message changes

4c2d47d

remove comment

d451c74

[pre-commit.ci] auto fixes from pre-commit.com hooks

080050e

for more information, see https://pre-commit.ci

sjfleming and others added 7 commits July 30, 2025 15:00

remove redundant code

77956e9

test the new function explicitly

dae91ab

additional test case

c99ec17

[pre-commit.ci] auto fixes from pre-commit.com hooks

989cca5

for more information, see https://pre-commit.ci

simplify code

887641e

[pre-commit.ci] auto fixes from pre-commit.com hooks

eefb639

for more information, see https://pre-commit.ci

ruff check

eb55618

sjfleming mentioned this pull request Jul 30, 2025

Implement dataloader h5ad reading in backed mode cellarium-ai/cellarium-ml#325

Draft

flying-sheep added the skip-gpu-ci label Jul 31, 2025

Merge branch 'main' into sf-backed-hdf5-fancy-indexing

07c4ab9

flying-sheep reviewed Jul 31, 2025

View reviewed changes

sjfleming and others added 2 commits July 31, 2025 13:51

address PR comments

be9cec0

[pre-commit.ci] auto fixes from pre-commit.com hooks

4368f0a

for more information, see https://pre-commit.ci

sjfleming added 3 commits August 8, 2025 18:13

Merge branch 'main' into sf-backed-hdf5-fancy-indexing

58b990b

Merge branch 'main' into sf-backed-hdf5-fancy-indexing

5ee3658

Merge branch 'main' into sf-backed-hdf5-fancy-indexing

7c23963

This was referenced Aug 26, 2025

Implemented dataloading in backed mode for spatial project cellarium-ai/cellarium-ml#341

Merged

use a patched version of anndata and python 3.11 cellarium-ai/cellarium-ml#343

Merged

ilan-gold requested a review from flying-sheep August 28, 2025 13:30

flying-sheep added 5 commits August 29, 2025 10:21

fix subset type

28b874c

use correct type

b669f71

compact

9cc31e2

some more compacting

78c6141

early return

b2b68c3

slightly improve reliability of peak mem calc

9734678

ilan-gold added the benchmark label Oct 21, 2025

ilan-gold removed the benchmark label Oct 21, 2025

revert the benchmark utils change

c558d7e

pare down benchmarks

ffa1a97

pre-commit-ci bot and others added 4 commits October 21, 2025 15:35

ci: pre-commit autoupdate (scverse#2160)

fd9f374

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a99b78f

for more information, see https://pre-commit.ci

modify new benchmarks, fix ruff errors

ec3cbb8

Merge branch 'main' into sf-backed-hdf5-fancy-indexing

655c3ba

ilan-gold added the benchmark label Oct 22, 2025

ilan-gold approved these changes Oct 22, 2025

View reviewed changes

ilan-gold merged commit 7c52d36 into scverse:main Oct 22, 2025
22 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/anndata that referenced this pull request Oct 22, 2025

Backport PR scverse#2066: fix: fancy indexing fixes backed h5py error

33a5c36

meeseeksmachine mentioned this pull request Oct 22, 2025

Backport PR #2066 on branch 0.12.x (fix: fancy indexing fixes backed h5py error) #2162

Merged

ilan-gold added a commit that referenced this pull request Oct 22, 2025

Revert "fix: fancy indexing fixes backed h5py error (#2066)"

5badca6

This reverts commit 7c52d36.

ilan-gold mentioned this pull request Oct 22, 2025

Revert "fix: fancy indexing fixes backed h5py error" #2163

Closed

ilan-gold pushed a commit that referenced this pull request Oct 22, 2025

Backport PR #2066 on branch 0.12.x (fix: fancy indexing fixes backed …

b2706da

…h5py error) (#2162) Co-authored-by: Stephen Fleming <[email protected]>

ilan-gold mentioned this pull request Oct 22, 2025

chore: use setup_cache for asv benchmarks #2164

Merged

3 tasks

fix: fancy indexing fixes backed h5py error #2066

fix: fancy indexing fixes backed h5py error #2066

Uh oh!

Conversation

sjfleming commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

flying-sheep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjfleming commented Aug 1, 2025

Uh oh!

sjfleming commented Oct 21, 2025

Uh oh!

ilan-gold commented Oct 21, 2025

Uh oh!

scverse-benchmark bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjfleming commented Oct 21, 2025

Uh oh!

sjfleming commented Oct 21, 2025

Uh oh!

sjfleming commented Oct 21, 2025

Uh oh!

ilan-gold commented Oct 22, 2025

Uh oh!

Uh oh!

ilan-gold commented Oct 22, 2025

Uh oh!

ilan-gold commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold commented Oct 22, 2025

Uh oh!

sjfleming commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sjfleming commented Jul 30, 2025 •

edited

Loading

codecov bot commented Jul 30, 2025 •

edited

Loading

scverse-benchmark bot commented Oct 21, 2025 •

edited

Loading

ilan-gold commented Oct 21, 2025 •

edited

Loading

ilan-gold commented Oct 22, 2025 •

edited

Loading