All tests on a single Visium section (~5 200 spots × 36 500 genes, sparse matrix ~4 % fill).
| Operation | Median | 95th pct | Peak RAM | Notes |
|---|---|---|---|---|
load_visium() |
2.1 s | 2.4 s | 180 MB | includes Space Ranger JSON parse |
neighbors(method='knn') |
180 ms | 210 ms | 50 MB | sklearn KDTree, expression PCA‑50 |
neighbors(method='radius', r=100) |
420 ms | 480 ms | 55 MB | radius search |
| Spatial Leiden (res=0.8) | 320 ms | 380 ms | 80 MB | igraph partition |
| Moran's I (single gene, 999 perm) | 350 ms | 400 ms | 10 MB | permutation loop |
| Local Moran (single gene, 99 perm) | 200 ms | 240 ms | 10 MB | |
| Stereoscope (5 k spots × 20 types, 400 epoch) | 6.2 min | — | 450 MB (CPU) | 90 s on CUDA A100 |
| Tangram (same) | 1.4 min | — | 380 MB | CUDA 45 s |
spatial_scatter() (matplotlib) |
120 ms | 140 ms | 20 MB | figure render only |
| Spots | Graph (knn) | Moran (999 perm) | Stereoscope (CPU) |
|---|---|---|---|
| 5 k | 0.18 s | 0.35 s | 6 min |
| 20 k | 0.75 s | 1.4 s | 25 min |
| 100 k | 4.2 s | 7.8 s | 2 h + |
Memory scales linearly with N_spots for dense graphs (~N² for radius method).
- Graph sparsification — reduce
n_neighborsto 10 for very large N; speeds clustering ×1.5 with minor accuracy loss. - Permutation count — for discovery use
n_permutations=99; for publication use999. Analytic Moran's I (no permutation) is available but less accurate for small N. - Deconvolution epochs — early stopping monitors ELBO; typical convergence plateau at 300–400 epochs. Reduce to 200 for prototyping.
- Device — CUDA accelerates Stereoscope×8; Tangram×4 (memory bound).
- Loader I/O — raw matrix HDF5 read dominates initial load; place data on NVMe.
- Neighbour graph — radius method O(N²); prefer KNN for N > 20 k.
- Deconvolution — GPU memory can OOM with >100 cell types; lower
batch_size. - Permutation tests — auto‑parallelise across genes via
joblibif you have many cores:morans_i(..., n_jobs=8).
python -m cProfile -o spatial.prof myscript.py
snakeviz spatial.profLook for accumulated time in:
spatial/io/loader.py— file parsingspatial/analysis/neighbors.py— scikit‑learn tree buildspatial/analysis/autocorrelation.py— permutation loop
- Radius graph with N > 50 k spots may exceed RAM; use
method='knn'. - CUDA support requires
pytorch+faiss‑gpu; CPU fallback works everywhere.