Skip to content

Conversation

@ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold marked this pull request as draft September 16, 2025 14:59
@codecov
Copy link

codecov bot commented Sep 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.44%. Comparing base (52344db) to head (39cb3c3).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2121      +/-   ##
==========================================
- Coverage   84.58%   84.44%   -0.14%     
==========================================
  Files          46       46              
  Lines        7105     7106       +1     
==========================================
- Hits         6010     6001       -9     
- Misses       1095     1105      +10     
Files with missing lines Coverage Δ
src/anndata/_core/merge.py 85.11% <100.00%> (+0.02%) ⬆️

... and 2 files with indirect coverage changes

@ilan-gold ilan-gold added this to the 0.12.2 milestone Sep 16, 2025
@scverse-benchmark
Copy link

scverse-benchmark bot commented Sep 16, 2025

Benchmark changes

Change Before [52344db] After [39cb3c3] Ratio Benchmark (Parameter)
- 647±2ms 239±3ms 0.37 dataset2d.Dataset2D.time_concat('h5ad', (-1,), 'cat')
- 651±3ms 209±0.7ms 0.32 dataset2d.Dataset2D.time_concat('h5ad', None, 'cat')
- 988±100ms 499±10ms 0.51 dataset2d.Dataset2D.time_concat('zarr', (-1,), 'cat')
- 1.11±0.01s 566±20ms 0.51 dataset2d.Dataset2D.time_concat('zarr', None, 'cat')
- 2.21±0.1ms 1.96±0.01ms 0.89 dataset2d.Dataset2D.time_full_to_memory('h5ad', (-1,), 'cat')
- 4.46±0.04ms 3.43±0.2ms 0.77 dataset2d.Dataset2D.time_full_to_memory('h5ad', (-1,), 'numeric')
+ 5.47±0.09ms 6.56±0.04ms 1.2 dataset2d.Dataset2D.time_getitem_bool_mask('h5ad', (-1,), 'string-array')
- 16.4±0.3ms 14.2±0.5ms 0.87 dataset2d.Dataset2D.time_getitem_slice('h5ad', None, 'numeric')
- 15.1±0.1ms 13.4±0.07ms 0.89 dataset2d.Dataset2D.time_read_lazy_default('h5ad', (-1,), 'numeric')
- 6.0 5.0 0.83 readwrite.H5ADWriteSuite.track_peakmem_write_compressed('pbmc3k')

Comparison: https://github.com/scverse/anndata/compare/52344dbb40037704f15d79bdd9329f31ed75074d..39cb3c3c7c76876342ff3c206f771f96e79a9987
Last changed: Tue, 28 Oct 2025 16:44:52 +0000

More details: https://github.com/scverse/anndata/pull/2121/checks?check_run_id=53884619945

@ilan-gold ilan-gold marked this pull request as ready for review October 1, 2025 15:53
@ilan-gold ilan-gold requested a review from flying-sheep October 2, 2025 14:00
@ilan-gold ilan-gold force-pushed the ig/accelerate_map_blocks branch from 017b829 to c29d7f1 Compare October 19, 2025 10:21
@ilan-gold ilan-gold force-pushed the ig/accelerate_map_blocks branch from c29d7f1 to 3bc4ee2 Compare October 19, 2025 10:27
@ilan-gold ilan-gold modified the milestones: 0.12.4, 0.12.5 Oct 27, 2025
res.compute()


class SparseCSRDask:
Copy link
Contributor Author

@ilan-gold ilan-gold Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered use name for the sparse block mapping but it had no appreciable effect: https://github.com/scverse/anndata/runs/51718340360 is the result and

da_mtx = da.map_blocks(
make_chunk,
dtype=dtype,
chunks=chunk_layout,
meta=memory_format((0, 0), dtype=dtype),
name=f"{uuid.uuid4()}/{path_or_sparse_dataset}/{elem_name}-{dtype}",
)
shows that the commit on which the benchmark was run contained the name parameter. I leave the benchmark in anyway

@ilan-gold ilan-gold requested review from flying-sheep and removed request for flying-sheep October 28, 2025 17:02
@ilan-gold
Copy link
Contributor Author

ilan-gold commented Oct 28, 2025

Ok @flying-sheep I know you reviewed, but the PR is grossly simplified now and hopefully the becnhmark results make some sense i.e., the big change is concat

@ilan-gold ilan-gold merged commit 41bc3b5 into main Oct 31, 2025
25 checks passed
@ilan-gold ilan-gold deleted the ig/accelerate_map_blocks branch October 31, 2025 15:30
meeseeksmachine pushed a commit to meeseeksmachine/anndata that referenced this pull request Oct 31, 2025
ilan-gold added a commit that referenced this pull request Nov 2, 2025
…to bypass tokenization) (#2191)

Co-authored-by: Ilan Gold <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ad.concat is slow on lazy data on account of tokenize

3 participants