GitHub - junjzhang/memviz-neo: PyTorch GPU memory visualization tool

memviz/neo — high-performance, browser-native, multi-rank PyTorch GPU memory snapshot viewer

Drop rank*.pickles into the page — everything parses, computes and renders locally. No backend, no upload, no waiting on someone else's server.

→ Open the app at junjzhang.github.io/memviz-neo ←

100% vibecoded. Not a single line written by a human. Prompts all the way down.

Why

PyTorch's torch.cuda.memory._dump_snapshot() gives you the data; the default viewer makes it hard to see. memviz/neo is a rebuild around:

Multi-rank in parallel — every rank parses in its own worker; the dashboard paints the moment any worker reports back.
WebGL2 instanced strips — 50 k+ allocs pan/zoom at 120 fps.
Better lenses — address-reuse-aware selection, cross-linked Memory + Segment timelines, bytes × lifetime flame graph, per-rank peak bars.

How it compares

	pytorch/memory_viz	desktop_memory_viz	memviz/neo
Runs in	browser, static site	native Rust desktop app	browser, static site
Install	— (URL)	`cargo build` + Python	— (URL)
Pickle path	JS unpickler in-browser	Python pre-extract → JSON, then Rust	Rust → WASM with frame interning
Renderer	SVG via D3	eframe / egui (wgpu)	WebGL2 instanced
Multi-rank view	pickle dropdown, one at a time	single file	whole run in parallel on a worker pool
Views	Active / Segment / State / Settings	Active Memory Timeline only	Multi-rank · Memory · Segment · Flame · Anomalies
Call-stack flame (bytes × lifetime)	—	—	✓
Cross-view selection linking	—	—	timeline ↔ segment ↔ detail panel
Address-reuse-aware selection	—	—	keyed on `(addr, alloc_us)`

Benchmarks

12.1 MiB pickle (50 k events, 90 segments, ~18 k allocs). Framework laptop, Intel iGPU, 120 Hz.

	pytorch	desktop_memory_viz	memviz/neo
Parse → interactive	163 ms	1588 ms	1040 ms
Layout + first paint	—	—	~2100 ms
JS heap after load	—	—	~420 MiB
Pan/zoom @ ~18 k allocs	—	—	120 fps · p95 8.3 ms
8-rank wall-clock	~1440 ms (seq.)	—	~1000 ms (parallel)

pytorch's parse is cheap because it defers layout until the user opens a view; we front-load interning, pairing, top-N and IR emit so every view switch after parse is free.

Reproduce: node bench/{memviz,pytorch,desktop,render}.mjs — see bench/README.md.

Views

Multi-Rank Overview — one bar per rank, heights scale on peak, click to switch focus.
Memory Timeline — WebGL2 instanced strips for every alloc. WASD pan/zoom X, Shift+WASD for Y, drag a box to zoom both, R/T+drag for rulers. X-axis toggles wall-clock μs ↔ event ordinal so dense phases stop collapsing into a smear.
Segment Timeline — one row per caching-allocator segment, allocs at their in-segment offset. Pan/zoom locks to Memory Timeline; selecting an alloc expands its row 30 → 120 px.
Anomalies — pending-free stalls + leak suspects, each cross-linked back to the timeline.
Memory Flame Graph — call-stack rolled up by bytes × lifetime. Drill-in breadcrumb, hover tooltip.

Usage

Open the site (or run locally), point Open Directory at a folder of rank*.pickle files, pick worker count + detail level (3k/10k/20k/all). Firefox/Safari fall back to multi-select file picker.

Nothing leaves your machine — WASM parser in a Worker, WebGL2 on your GPU, zero fetch() beyond the bundle.

Architecture

rank*.pickle ──► Parse worker ──► Rust/WASM pickle parser ──► interned frames / stacks
                                                            │
                                                            ├─► timeline strips (Float32Array, event + time variants)
                                                            ├─► segment rows (per-segment alloc buckets)
                                                            ├─► flame graph (stack-weighted prefix trie)
                                                            └─► anomalies (leak + pending-free flags)

main thread ──► Zustand stores ──► React views ──► WebGL2 instanced draw

Rust + wasm-bindgen pickle parser (wasm/). Hand-rolled streaming parser; Rc-shared values handle MEMOIZE/BINGET reuse without cloning frame lists.
Frame / stack interning — 3.5 M frame entries collapse to ~1400 unique frames, one u32 per event.
Worker pool — parse + layout per rank in parallel, first rank back drives the dashboard.
Pre-packed GPU buffers — event-mode and time-mode variants precomputed, X-axis toggle is one bufferData call.
Address-reuse-aware selection — keys off (addr, alloc_us), not addr.
React 19 + Zustand + AntD (dark), Vite + vite-plugin-wasm.

Develop

Prereqs: rustup target add wasm32-unknown-unknown, wasm-pack, pnpm, Node 22.

cd web
pnpm install
pnpm dev          # auto-builds wasm if pkg/ is missing

Other commands:

pnpm build:wasm   # wasm-pack build --release
pnpm build        # typecheck + vite build (runs build:wasm first)

Synthetic snapshots for perf work:

python scripts/gen_test_data.py --ranks 8 --events 20000 --out test_data/

Release

Tag-driven. Push main → ci.yml runs typecheck + build, no deploy. Push a v* tag → release.yml builds with VITE_APP_VERSION=${tag}, deploys to GitHub Pages, and creates a GitHub Release with auto-generated notes.

git tag v0.1.0 && git push origin v0.1.0

First-time setup: Settings → Pages → Source: GitHub Actions.

Roadmap

Tracked in #5.

Multi-rank diff + insights — side-by-side comparison across ranks of the same run, auto-surfaced observations (peak skew, allocator stalls, leak suspects concentrated on one rank).
Agent-friendly interface — headless CLI + exposed parser/analysis primitives so code agents can consume snapshots via scripted calls and skill invocations, not just the browser UI.
Better anomaly detection — beyond pending-free + large-long-lived: fragmentation patterns, cross-rank outliers, allocator misconfiguration heuristics.

Acknowledgements

pytorch/memory_viz — official viewer; defined the pickle schema.
C-J-Cundy/desktop_memory_viz — desktop rework, seeded several interactions here.

License

0BSD — take it, ship it, no attribution required.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
bench		bench
docs		docs
scripts		scripts
wasm		wasm
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

→ Open the app at junjzhang.github.io/memviz-neo ←

Why

How it compares

Benchmarks

Views

Usage

Architecture

Develop

Release

Roadmap

Acknowledgements

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

→ Open the app at junjzhang.github.io/memviz-neo ←

Why

How it compares

Benchmarks

Views

Usage

Architecture

Develop

Release

Roadmap

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages