Drop rank*.pickles into the page — everything parses, computes and renders
locally. No backend, no upload, no waiting on someone else's server.
100% vibecoded. Not a single line written by a human. Prompts all the way down.
PyTorch's torch.cuda.memory._dump_snapshot() gives you the data; the
default viewer makes it hard to see. memviz/neo is a rebuild around:
- Multi-rank in parallel — every rank parses in its own worker; the dashboard paints the moment any worker reports back.
- WebGL2 instanced strips — 50 k+ allocs pan/zoom at 120 fps.
- Better lenses — address-reuse-aware selection, cross-linked
Memory + Segment timelines,
bytes × lifetimeflame graph, per-rank peak bars.
| pytorch/memory_viz | desktop_memory_viz | memviz/neo | |
|---|---|---|---|
| Runs in | browser, static site | native Rust desktop app | browser, static site |
| Install | — (URL) | cargo build + Python |
— (URL) |
| Pickle path | JS unpickler in-browser | Python pre-extract → JSON, then Rust | Rust → WASM with frame interning |
| Renderer | SVG via D3 | eframe / egui (wgpu) | WebGL2 instanced |
| Multi-rank view | pickle dropdown, one at a time | single file | whole run in parallel on a worker pool |
| Views | Active / Segment / State / Settings | Active Memory Timeline only | Multi-rank · Memory · Segment · Flame · Anomalies |
| Call-stack flame (bytes × lifetime) | — | — | ✓ |
| Cross-view selection linking | — | — | timeline ↔ segment ↔ detail panel |
| Address-reuse-aware selection | — | — | keyed on (addr, alloc_us) |
12.1 MiB pickle (50 k events, 90 segments, ~18 k allocs). Framework laptop, Intel iGPU, 120 Hz.
| pytorch | desktop_memory_viz | memviz/neo | |
|---|---|---|---|
| Parse → interactive | 163 ms | 1588 ms | 1040 ms |
| Layout + first paint | — | — | ~2100 ms |
| JS heap after load | — | — | ~420 MiB |
| Pan/zoom @ ~18 k allocs | — | — | 120 fps · p95 8.3 ms |
| 8-rank wall-clock | ~1440 ms (seq.) | — | ~1000 ms (parallel) |
pytorch's parse is cheap because it defers layout until the user opens a view; we front-load interning, pairing, top-N and IR emit so every view switch after parse is free.
Reproduce: node bench/{memviz,pytorch,desktop,render}.mjs — see
bench/README.md.
- Multi-Rank Overview — one bar per rank, heights scale on peak, click to switch focus.
- Memory Timeline — WebGL2 instanced strips for every alloc.
WASDpan/zoom X,Shift+WASDfor Y, drag a box to zoom both,R/T+drag for rulers. X-axis toggles wall-clock μs ↔ event ordinal so dense phases stop collapsing into a smear. - Segment Timeline — one row per caching-allocator segment, allocs at their in-segment offset. Pan/zoom locks to Memory Timeline; selecting an alloc expands its row 30 → 120 px.
- Anomalies — pending-free stalls + leak suspects, each cross-linked back to the timeline.
- Memory Flame Graph — call-stack rolled up by
bytes × lifetime. Drill-in breadcrumb, hover tooltip.
Open the site (or run locally), point Open Directory at a folder of
rank*.pickle files, pick worker count + detail level (3k/10k/20k/all).
Firefox/Safari fall back to multi-select file picker.
Nothing leaves your machine — WASM parser in a Worker, WebGL2 on your
GPU, zero fetch() beyond the bundle.
rank*.pickle ──► Parse worker ──► Rust/WASM pickle parser ──► interned frames / stacks
│
├─► timeline strips (Float32Array, event + time variants)
├─► segment rows (per-segment alloc buckets)
├─► flame graph (stack-weighted prefix trie)
└─► anomalies (leak + pending-free flags)
main thread ──► Zustand stores ──► React views ──► WebGL2 instanced draw
- Rust +
wasm-bindgenpickle parser (wasm/). Hand-rolled streaming parser;Rc-shared values handleMEMOIZE/BINGETreuse without cloning frame lists. - Frame / stack interning — 3.5 M frame entries collapse to ~1400 unique frames, one
u32per event. - Worker pool — parse + layout per rank in parallel, first rank back drives the dashboard.
- Pre-packed GPU buffers — event-mode and time-mode variants precomputed, X-axis toggle is one
bufferDatacall. - Address-reuse-aware selection — keys off
(addr, alloc_us), notaddr. - React 19 + Zustand + AntD (dark), Vite +
vite-plugin-wasm.
Prereqs: rustup target add wasm32-unknown-unknown, wasm-pack, pnpm,
Node 22.
cd web
pnpm install
pnpm dev # auto-builds wasm if pkg/ is missingOther commands:
pnpm build:wasm # wasm-pack build --release
pnpm build # typecheck + vite build (runs build:wasm first)Synthetic snapshots for perf work:
python scripts/gen_test_data.py --ranks 8 --events 20000 --out test_data/Tag-driven. Push main → ci.yml runs
typecheck + build, no deploy. Push a v* tag → release.yml
builds with VITE_APP_VERSION=${tag}, deploys to GitHub Pages, and
creates a GitHub Release with auto-generated notes.
git tag v0.1.0 && git push origin v0.1.0First-time setup: Settings → Pages → Source: GitHub Actions.
Tracked in #5.
- Multi-rank diff + insights — side-by-side comparison across ranks of the same run, auto-surfaced observations (peak skew, allocator stalls, leak suspects concentrated on one rank).
- Agent-friendly interface — headless CLI + exposed parser/analysis primitives so code agents can consume snapshots via scripted calls and skill invocations, not just the browser UI.
- Better anomaly detection — beyond pending-free + large-long-lived: fragmentation patterns, cross-rank outliers, allocator misconfiguration heuristics.
- pytorch/memory_viz — official viewer; defined the pickle schema.
- C-J-Cundy/desktop_memory_viz — desktop rework, seeded several interactions here.
0BSD — take it, ship it, no attribution required.


