Skip to content

junjzhang/memviz-neo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memviz/neo — high-performance, browser-native, multi-rank PyTorch GPU memory snapshot viewer

Drop rank*.pickles into the page — everything parses, computes and renders locally. No backend, no upload, no waiting on someone else's server.

License: 0BSD Stack: rust · wasm · webgl2

welcome page

100% vibecoded. Not a single line written by a human. Prompts all the way down.


Why

PyTorch's torch.cuda.memory._dump_snapshot() gives you the data; the default viewer makes it hard to see. memviz/neo is a rebuild around:

  1. Multi-rank in parallel — every rank parses in its own worker; the dashboard paints the moment any worker reports back.
  2. WebGL2 instanced strips — 50 k+ allocs pan/zoom at 120 fps.
  3. Better lenses — address-reuse-aware selection, cross-linked Memory + Segment timelines, bytes × lifetime flame graph, per-rank peak bars.

dashboard overview

How it compares

pytorch/memory_viz desktop_memory_viz memviz/neo
Runs in browser, static site native Rust desktop app browser, static site
Install — (URL) cargo build + Python — (URL)
Pickle path JS unpickler in-browser Python pre-extract → JSON, then Rust Rust → WASM with frame interning
Renderer SVG via D3 eframe / egui (wgpu) WebGL2 instanced
Multi-rank view pickle dropdown, one at a time single file whole run in parallel on a worker pool
Views Active / Segment / State / Settings Active Memory Timeline only Multi-rank · Memory · Segment · Flame · Anomalies
Call-stack flame (bytes × lifetime)
Cross-view selection linking timeline ↔ segment ↔ detail panel
Address-reuse-aware selection keyed on (addr, alloc_us)

Benchmarks

12.1 MiB pickle (50 k events, 90 segments, ~18 k allocs). Framework laptop, Intel iGPU, 120 Hz.

pytorch desktop_memory_viz memviz/neo
Parse → interactive 163 ms 1588 ms 1040 ms
Layout + first paint ~2100 ms
JS heap after load ~420 MiB
Pan/zoom @ ~18 k allocs 120 fps · p95 8.3 ms
8-rank wall-clock ~1440 ms (seq.) ~1000 ms (parallel)

pytorch's parse is cheap because it defers layout until the user opens a view; we front-load interning, pairing, top-N and IR emit so every view switch after parse is free.

Reproduce: node bench/{memviz,pytorch,desktop,render}.mjs — see bench/README.md.

Views

  • Multi-Rank Overview — one bar per rank, heights scale on peak, click to switch focus.
  • Memory Timeline — WebGL2 instanced strips for every alloc. WASD pan/zoom X, Shift+WASD for Y, drag a box to zoom both, R/T+drag for rulers. X-axis toggles wall-clock μs ↔ event ordinal so dense phases stop collapsing into a smear.
  • Segment Timeline — one row per caching-allocator segment, allocs at their in-segment offset. Pan/zoom locks to Memory Timeline; selecting an alloc expands its row 30 → 120 px.
  • Anomalies — pending-free stalls + leak suspects, each cross-linked back to the timeline.
  • Memory Flame Graph — call-stack rolled up by bytes × lifetime. Drill-in breadcrumb, hover tooltip.

flame graph

Usage

Open the site (or run locally), point Open Directory at a folder of rank*.pickle files, pick worker count + detail level (3k/10k/20k/all). Firefox/Safari fall back to multi-select file picker.

Nothing leaves your machine — WASM parser in a Worker, WebGL2 on your GPU, zero fetch() beyond the bundle.

Architecture

rank*.pickle ──► Parse worker ──► Rust/WASM pickle parser ──► interned frames / stacks
                                                            │
                                                            ├─► timeline strips (Float32Array, event + time variants)
                                                            ├─► segment rows (per-segment alloc buckets)
                                                            ├─► flame graph (stack-weighted prefix trie)
                                                            └─► anomalies (leak + pending-free flags)

main thread ──► Zustand stores ──► React views ──► WebGL2 instanced draw
  • Rust + wasm-bindgen pickle parser (wasm/). Hand-rolled streaming parser; Rc-shared values handle MEMOIZE/BINGET reuse without cloning frame lists.
  • Frame / stack interning — 3.5 M frame entries collapse to ~1400 unique frames, one u32 per event.
  • Worker pool — parse + layout per rank in parallel, first rank back drives the dashboard.
  • Pre-packed GPU buffers — event-mode and time-mode variants precomputed, X-axis toggle is one bufferData call.
  • Address-reuse-aware selection — keys off (addr, alloc_us), not addr.
  • React 19 + Zustand + AntD (dark), Vite + vite-plugin-wasm.

Develop

Prereqs: rustup target add wasm32-unknown-unknown, wasm-pack, pnpm, Node 22.

cd web
pnpm install
pnpm dev          # auto-builds wasm if pkg/ is missing

Other commands:

pnpm build:wasm   # wasm-pack build --release
pnpm build        # typecheck + vite build (runs build:wasm first)

Synthetic snapshots for perf work:

python scripts/gen_test_data.py --ranks 8 --events 20000 --out test_data/

Release

Tag-driven. Push mainci.yml runs typecheck + build, no deploy. Push a v* tag → release.yml builds with VITE_APP_VERSION=${tag}, deploys to GitHub Pages, and creates a GitHub Release with auto-generated notes.

git tag v0.1.0 && git push origin v0.1.0

First-time setup: Settings → Pages → Source: GitHub Actions.

Roadmap

Tracked in #5.

  • Multi-rank diff + insights — side-by-side comparison across ranks of the same run, auto-surfaced observations (peak skew, allocator stalls, leak suspects concentrated on one rank).
  • Agent-friendly interface — headless CLI + exposed parser/analysis primitives so code agents can consume snapshots via scripted calls and skill invocations, not just the browser UI.
  • Better anomaly detection — beyond pending-free + large-long-lived: fragmentation patterns, cross-rank outliers, allocator misconfiguration heuristics.

Acknowledgements

License

0BSD — take it, ship it, no attribution required.

About

PyTorch GPU memory visualization tool

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors