You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, rsc.pp.scrublet doesn't support Dask arrays, and there's a relatively straightforward path to implement one (at least, from what I know).
Background:
Scrublet only really needs to run within a sample, or batch. This is provided to the function as a 'batch_key'
These samples/batches are typically on the order of < 100k cells for batches, or < 10,000 for samples, meaning that they can fit within a typical GPU's memory.
Implementation concept:
Check the the anndata object has a Dask array. If so, require a batch_key be provided.
Rechunk the dask array by batch_key - one dask array for each batch_key
Run scrublet in memory on each GPU (.compute_chunk_sizes())