feat(format): cpu-work-stealing-v1 + encoder-forward-v1 8-gate PARTIAL discharge#1397
Open
noahgift wants to merge 2 commits into
Open
feat(format): cpu-work-stealing-v1 + encoder-forward-v1 8-gate PARTIAL discharge#1397noahgift wants to merge 2 commits into
noahgift wants to merge 2 commits into
Conversation
…L discharge Bundles two sister contracts in one verdict module: cpu-work-stealing-v1 (FALSIFY-WS-001..004): - WS-001: dispatch overhead < 1ms per forward pass - WS-002: L1 cache miss rate < 5% during inner loop - WS-003: matvec parity vs Rayon within 1e-6 - WS-004: 4-thread throughput ≥ 3.5× single-thread encoder-forward-v1 (FALSIFY-ENC-001..004): - ENC-001: 12 encoder layers preserve (n, 768) shape - ENC-002: every output element finite for inputs in [-10, 10] - ENC-003: aprender output matches HF reference within 1e-4 - ENC-004: CLS pooling extracts encoder_output[0] bit-exactly ## Five Whys 1. Why bundle these two contracts? Both peripheral, span the parallel-runtime + encoder-forward coverage band; one verdict module captures both without duplicate provenance pin. 2. Why does this block ship? Coverage % cannot move while these peripheral contracts are unbound at PARTIAL_ALGORITHM_LEVEL. 3. Why strict `<` for WS-001 (not `<=`)? The contract says "< 1ms per forward pass." Equality at exactly 1ms would mean the dispatcher is consuming the entire budget — there's no room for the actual matmul. Strict `<` catches the regression class "atomic contention saturated the overhead window." 4. Why bit-exact (`to_bits()`) for ENC-004 (CLS pooling)? The spec calls it "extract row 0" — pooling is a pure index operation, no float arithmetic. Any drift between `encoder_output[0]` and `cls_embedding` indicates the pooler is averaging or selecting a different row, not just precision loss. Strict bit-equal catches the regression class. 5. Why a separate dimension check for ENC-001 (`AC_ENC_HIDDEN_DIM` AND `AC_ENC_LAYER_COUNT` AND seq-len preservation)? The contract bundles three invariants — count of layers, hidden dim 768, sequence length preserved through layers. A single shape-equal check would let "layer dropped a dim AND added another to compensate" pass. Modeling the three invariants separately catches every mutation class independently. Adds 28 unit tests including 6-bucket scaling sweep + 5-bucket layer-count sweep. Realistic-healthy walks the canonical 4-thread RTX-4090 + BERT-base scenario; pre-fix walks 8 simultaneous regressions across both contracts. No runtime % shift; algorithm-level coverage advances by 8 gates.
3965ff9 to
21044ae
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundles two sister contracts in one verdict module:
cpu-work-stealing-v1(FALSIFY-WS-001..004): dispatch overhead, L1 miss rate, Rayon parity, scaling efficiencyencoder-forward-v1(FALSIFY-ENC-001..004): shape preservation, finite output, HF reference, CLS pooling28 unit tests including 6-bucket scaling sweep + 5-bucket layer-count sweep.
Algorithm-level coverage advances by 8 gates; runtime ship % unchanged.
Gates bound
< 1msper forward pass< 5%1e-6≥ 3.5 × single-thread1e-4encoder_output[0]Five Whys
See commit message — captures strict
<for WS-001 dispatch budget, bit-exact for ENC-004 CLS pooling, and why ENC-001 models three invariants independently.Test plan
cargo test -p aprender-core --lib ws_enc— 28 passed🤖 Generated with Claude Code