-
Notifications
You must be signed in to change notification settings - Fork 177
perf: use name in map_blocks to bypass tokenization
#2121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2121 +/- ##
==========================================
- Coverage 84.58% 84.44% -0.14%
==========================================
Files 46 46
Lines 7105 7106 +1
==========================================
- Hits 6010 6001 -9
- Misses 1095 1105 +10
|
Benchmark changes
Comparison: https://github.com/scverse/anndata/compare/52344dbb40037704f15d79bdd9329f31ed75074d..39cb3c3c7c76876342ff3c206f771f96e79a9987 More details: https://github.com/scverse/anndata/pull/2121/checks?check_run_id=53884619945 |
017b829 to
c29d7f1
Compare
c29d7f1 to
3bc4ee2
Compare
| res.compute() | ||
|
|
||
|
|
||
| class SparseCSRDask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered use name for the sparse block mapping but it had no appreciable effect: https://github.com/scverse/anndata/runs/51718340360 is the result and
anndata/src/anndata/_io/specs/lazy_methods.py
Lines 172 to 178 in 59041d4
| da_mtx = da.map_blocks( | |
| make_chunk, | |
| dtype=dtype, | |
| chunks=chunk_layout, | |
| meta=memory_format((0, 0), dtype=dtype), | |
| name=f"{uuid.uuid4()}/{path_or_sparse_dataset}/{elem_name}-{dtype}", | |
| ) |
name parameter. I leave the benchmark in anyway
|
Ok @flying-sheep I know you reviewed, but the PR is grossly simplified now and hopefully the becnhmark results make some sense i.e., the big change is |
…tokenization
…to bypass tokenization) (#2191) Co-authored-by: Ilan Gold <[email protected]>
ad.concatis slow on lazy data on account oftokenize#1989