Skip to content

feat(format): dimension-independent-kernels-v1 + distributed-training-v1 4-gate PARTIAL discharge#1399

Closed
noahgift wants to merge 2 commits into
mainfrom
feat/dim-dist-001-004-partial-discharge
Closed

feat(format): dimension-independent-kernels-v1 + distributed-training-v1 4-gate PARTIAL discharge#1399
noahgift wants to merge 2 commits into
mainfrom
feat/dim-dist-001-004-partial-discharge

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

@noahgift noahgift commented May 2, 2026

Summary

Bundles two unrelated 2-gate sister contracts:

  • dimension-independent-kernels-v1 (FALSIFY-DIM-001..002): output equiv vs specialized, no per-launch recompile
  • distributed-training-v1 (FALSIFY-DIST-001..002): gradient sync, loss equivalence

20 unit tests including 7-bucket loss-delta sweep on DIST-002.
Algorithm-level coverage advances by 4 gates; runtime ship % unchanged.

Gates bound

Gate ID Rule
DIM-001 dim-independent vs specialized within 1e-5
DIM-002 kernel loaded once (load_count == 1), launch_count > 0
DIST-001 every rank's params bit-equal to rank 0
DIST-002 distributed loss vs single-worker within 1e-4

Five Whys

See commit message — captures bit-exact for DIST-001, looser tolerance for DIST-002 vs DIM-001, and fail-on-zero-launches for DIM-002.

Test plan

  • cargo test -p aprender-core --lib dim_dist — 20 passed
  • PMAT pre-commit gates green
  • CI green

🤖 Generated with Claude Code

…-v1 4-gate PARTIAL discharge

Bundles two unrelated 2-gate sister contracts:

dimension-independent-kernels-v1 (FALSIFY-DIMENSION_INDEPENDENT_KERNELS_V1_001..002):
- DIM-001: dim-independent output ≈ specialized output within 1e-5
- DIM-002: kernel binary loaded once, M/K/N passed at launch

distributed-training-v1 (FALSIFY-DISTRIBUTED_TRAINING_V1_001..002):
- DIST-001: every rank's params bit-equal to rank 0 after sync
- DIST-002: distributed loss ≈ single-worker loss within 1e-4

## Five Whys

1. Why bundle these two contracts? Both peripheral, span the
   GPU-kernel-parameterization + distributed-training coverage band;
   one verdict module captures both without provenance pin overhead.
2. Why does this block ship? Coverage % cannot move while these
   peripheral contracts are unbound at PARTIAL_ALGORITHM_LEVEL.
3. Why bit-exact (`to_bits()`) for DIST-001 (not f32-tolerant)? The
   contract says "params identical across workers." Distributed
   training all-reduce is a deterministic operation — any drift
   between ranks means a sync bug, not float rounding. ULP-strict
   catches the regression class "all-reduce silently dropped a
   gradient on one rank."
4. Why looser 1e-4 for DIST-002 vs 1e-5 for DIM-001? DIM-001
   compares two pure GEMM outputs (one path, one numeric ordering).
   DIST-002 compares full training loss across worker counts —
   different reduction orders, different batch boundaries. The
   wider tolerance absorbs reduction-order drift while still
   catching real divergence.
5. Why fail-on-zero-launches for DIM-002? Vacuous Pass when
   `launch_count == 0` would mask "the kernel was never dispatched
   at all" — that's a different bug than "kernel was recompiled
   per launch" but equally a regression in the dispatch path.
   Fail-on-zero forces the call site to actually exercise the
   kernel before claiming the no-recompile gate is satisfied.

Adds 20 unit tests including a 7-bucket loss-delta sweep on
DIST-002. Realistic-healthy walks the canonical 4-rank training
state; pre-fix walks 4 simultaneous regressions.

No runtime % shift; algorithm-level coverage advances by 4 gates.
@noahgift noahgift enabled auto-merge (squash) May 11, 2026 15:15
@noahgift noahgift force-pushed the feat/dim-dist-001-004-partial-discharge branch from dedcc35 to abfd9d0 Compare May 11, 2026 15:15
@noahgift
Copy link
Copy Markdown
Contributor Author

Superseded by #1637 (135-PR squash). The commit content is included verbatim in that PR's diff. Closing now to release runner slots; this PR would have auto-closed when #1637 merges.

@noahgift noahgift closed this May 12, 2026
auto-merge was automatically disabled May 12, 2026 09:20

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant