Skip to content

🎉 docs(spec): SHIP-TWO-001 §75 — MODEL-1 SHIP % = 100% (SHIP-007 LIVE-DISCHARGED)#1652

Merged
noahgift merged 16 commits into
mainfrom
docs/section-75-model-1-100-percent
May 14, 2026
Merged

🎉 docs(spec): SHIP-TWO-001 §75 — MODEL-1 SHIP % = 100% (SHIP-007 LIVE-DISCHARGED)#1652
noahgift merged 16 commits into
mainfrom
docs/section-75-model-1-100-percent

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

🎉 MODEL-1 SHIP % = 100%

All 10 AC-SHIP1- LIVE-DISCHARGED.*

PR-E (#1651) shipped the F32 GEMV PTX layout fix that closes SHIP-007 (the last PARTIAL). §75 records the discharge.

10/10 LIVE-discharge table

AC Discharge section Path
SHIP-001 §72 apr run <safetensors> exit 0
SHIP-002 §61 apr run "def fib(n):" valid Python (#1609)
SHIP-003 §72 apr diff 20 tensors at cos_sim=1.000000
SHIP-004 §72 llama-cli exit 0
SHIP-005 §71 HumanEval pass@1 = 86.59% (gx10 164-run)
SHIP-006 §61.8 apr qa 12-gate aggregate PASS (#1615)
SHIP-007 §75 PARITY-GATE PASS + 124.6 tok/s @ 128-tok decode
SHIP-008 §61 apr run SHIP-008 USER → 256-token ChatML (#1614)
SHIP-009 §72 apr inspect license/provenance
SHIP-010 §72 sha256 match 0a854098…

Cascade arc

§ Date Discovery
63 2026-05-11 SHIP-007 framed as 3-layer cascade
73 2026-05-12 Re-measurement: only parity blocks
74 2026-05-13 Bug LOCALIZED to F32 GEMV
75 2026-05-13 PR-E fix → MODEL-1 100%

§73 estimated "3-5 PR / 3-5 days". Actual: 4 PRs (#1648/#1649/#1650/#1651) in 2 days.

Methodology lesson #22 NEW

Symptom analysis → bug class localization in O(1). Sign-flipped top-K divergences + CPU/GPU mean mismatch + sane intermediates → exactly one bug class (transposed matmul). Lessons compose; each makes the next cheaper.

Ship-% movement

  • MODEL-1: 99% → 100% 🎉
  • MODEL-2: unchanged at 57% (independent track)

Test plan

  • Empirical discharge: apr bench 5-iter 128-tok = 124.6 tok/s on default path
  • PARITY-GATE PASS (no error)
  • All AC-SHIP1-* paths captured in evidence dirs
  • Spec v3.19.0 → v3.21.0

Refs

🤖 Generated with Claude Code

…P-TWO-SECTION-75)

PR-E (#1651) shipped the single-file F32 GEMV PTX layout fix. SHIP-007
LIVE-DISCHARGED. All 10 AC-SHIP1-* now LIVE on canonical 7B Qwen2.5-
Coder-Instruct Q4_K_M teacher.

10/10 LIVE-discharge table:
  SHIP-001  §72  apr run <safetensors> exit 0
  SHIP-002  §61  apr run "def fib(n):" valid Python (#1609)
  SHIP-003  §72  apr diff 20 tensors at cos_sim=1.000000
  SHIP-004  §72  llama-cli exit 0, 133.1 gen tok/s
  SHIP-005  §71  HumanEval pass@1 = 86.59% (gx10 164-run)
  SHIP-006  §61.8 apr qa 12-gate aggregate PASS (#1615)
  SHIP-007  §75  PARITY-GATE PASS + 124.6 tok/s @ 128-tok (this section)
  SHIP-008  §61  apr run SHIP-008 USER → 256-token ChatML (#1614)
  SHIP-009  §72  apr inspect license/provenance fields
  SHIP-010  §72  sha256 match 0a854098…

Empirical discharge proof for SHIP-007:
  apr bench <canonical 7B APR> --iterations 5 --max-tokens 128
  → tokens_per_second: 124.6
  → AC-SHIP1-007 floor: 30 → headroom 4.15×
  → PARITY-GATE: PASS (no error)
  → Default path (CUDA graphed), no SKIP_PARITY_GATE, no APR_SKIP_FP8_WARMUP

Cascade arc closeout:
  §63 2026-05-11 → SHIP-007 framed as 3-layer cascade
  §73 2026-05-12 → re-measurement: only parity layer blocks
  §74 2026-05-13 → bug LOCALIZED to F32 GEMV via PR-B stage bisection
  §75 2026-05-13 → PR-E layout fix → MODEL-1 100%

§73's '3-5 PR / 3-5 day' estimate. Actual: 4 PRs (#1648 contract,

Methodology lesson #22 NEW: symptom analysis (sign-flipped top-K
divergences + CPU/GPU mean mismatch + sane intermediates) →
bug class localization in O(1). Methodology lessons compose;
each makes the next cheaper.

Ship-% movement:
  MODEL-1 ship %: 99% → 100% 🎉
  MODEL-2 ship %: unchanged at 57% (independent track,
    gated on step 5g.3 val_loss < 9.38).

Spec version: 3.19.0 → 3.21.0 (post-§72/73 stack at 3.18.0;
§74 at 3.20.0; §75 here at 3.21.0).

Out of scope (future work):
- MODEL-2 ship % path (independent track, separate cascade)
- Publish-readiness gates (GATE-SHIP-001/002/003 still need green CI +
  post-publish QA per feedback_post_publish_qa_required.md)
- HumanEval/MBPP benchmark improvements beyond §71's 86.59%

Refs:
- §74 SHIP-007 localization (PR #1650)
- §73 SHIP-007 cascade reduction (PR #1647)
- PR #1648 (contract scaffold), #1649 (PR-B stage dump)
- PR #1651 (PR-E F32 GEMV layout fix)
- AC-SHIP1-007 (spec §5)
- evidence/section-75-ship-007-discharged-2026-05-13/

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the docs/section-75-model-1-100-percent branch from b3b7835 to 598b323 Compare May 13, 2026 09:11
noahgift added a commit that referenced this pull request May 13, 2026
…T-CODE-V0-33-0-RELEASE-PREP)

🎉 v0.33.0 marks **MODEL-1 SHIP % = 100%** for SHIP-TWO-001.

All 10 AC-SHIP1-* falsifiers are LIVE-discharged on the canonical
7B Qwen2.5-Coder-Instruct Q4_K_M teacher (lambda-vector RTX 4090,
--features cuda).

This release prep PR ships:
1. CHANGELOG.md [0.33.0] entry with §69-§75 highlights:
   - 🎉 MODEL-1 SHIP % = 100% (all 10 AC-SHIP1-* LIVE)
   - Fixed: SHIP-007 F32 GEMV PTX layout (PR #1651, §75) — 124.6 tok/s
   - Fixed: SHIP-005 HumanEval RC3 (PR #1635, §70/§71) — pass@1 86.59%
   - Added: APR_EVAL_DEBUG=1 diagnostic surface (PR #1634)
   - Added: APR_GPU_STAGE_DUMP=<dir> diagnostic surface (PR #1649)
   - Added: MBPP harness H4 fix (PR #1645)
   - Added: 2 new falsifiable contracts (apr-eval-humaneval-harness-
     invariant v1.1.0, apr-ship-007-gpu-stage-bisection v1.0.0)
   - Methodology lessons #16-22 captured in MEMORY.md
   - Spec: v3.13.0 → v3.21.0 across §67-§75

2. Workspace version bump:
   - [workspace.package].version: 0.32.0 → 0.33.0
   - Root [package].version (aprender facade crate): 0.32.0 → 0.33.0
   - 28 sub-crate version literals: 0.32.0 → 0.33.0

3. `cargo check -p aprender` → clean (workspace builds at 0.33.0).

Out of scope for this PR (separate steps after #1651/1652 land + this
PR lands):
- Tag release `v0.33.0` on main
- Cascade publish to crates.io (per memory project_ship_two_001_v0_32_0_release.md
  — 15 user-facing crates + 7 internal-tier in topological dependency
  order; uses `make publish CRATE=<name>`)
- Post-publish QA per `feedback_post_publish_qa_required.md` —
  `cargo install aprender --force` + `/dogfood` GO verdict required
  before declaring release done (v0.31.1 was yanked for skipping this)
- GitHub Release with §75 narrative
- HF artifact verification (paiml/qwen2.5-coder-7b-apache-q4k-v1 sha256
  already verified by §72 SHIP-010 LIVE evidence; double-check before
  release announcement)

This PR ships ONLY the version-bump + CHANGELOG. Publishing is the
next step after merge.

Refs:
- §75 MODEL-1 100% (PR #1652)
- §74 SHIP-007 bug localized (PR #1650)
- §73 SHIP-007 cascade reduction (PR #1647)
- §72 5-AC LIVE cascade (PR #1646)
- §71 SHIP-005 LIVE-DISCHARGED (PR #1642)
- §70 RC3 fix (PR #1636)
- §69 Q4K hypothesis falsified (PR #1633)
- PR #1635 RC3 prepend
- PR #1634 diagnostic surface + contract
- PR #1648 SHIP-007 contract scaffold
- PR #1649 SHIP-007 PR-B stage dump
- PR #1651 SHIP-007 PR-E F32 GEMV layout fix

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 13, 2026
…T-CODE-V0-33-0-RELEASE-PREP) (#1653)

🎉 v0.33.0 marks **MODEL-1 SHIP % = 100%** for SHIP-TWO-001.

All 10 AC-SHIP1-* falsifiers are LIVE-discharged on the canonical
7B Qwen2.5-Coder-Instruct Q4_K_M teacher (lambda-vector RTX 4090,
--features cuda).

This release prep PR ships:
1. CHANGELOG.md [0.33.0] entry with §69-§75 highlights:
   - 🎉 MODEL-1 SHIP % = 100% (all 10 AC-SHIP1-* LIVE)
   - Fixed: SHIP-007 F32 GEMV PTX layout (PR #1651, §75) — 124.6 tok/s
   - Fixed: SHIP-005 HumanEval RC3 (PR #1635, §70/§71) — pass@1 86.59%
   - Added: APR_EVAL_DEBUG=1 diagnostic surface (PR #1634)
   - Added: APR_GPU_STAGE_DUMP=<dir> diagnostic surface (PR #1649)
   - Added: MBPP harness H4 fix (PR #1645)
   - Added: 2 new falsifiable contracts (apr-eval-humaneval-harness-
     invariant v1.1.0, apr-ship-007-gpu-stage-bisection v1.0.0)
   - Methodology lessons #16-22 captured in MEMORY.md
   - Spec: v3.13.0 → v3.21.0 across §67-§75

2. Workspace version bump:
   - [workspace.package].version: 0.32.0 → 0.33.0
   - Root [package].version (aprender facade crate): 0.32.0 → 0.33.0
   - 28 sub-crate version literals: 0.32.0 → 0.33.0

3. `cargo check -p aprender` → clean (workspace builds at 0.33.0).

Out of scope for this PR (separate steps after #1651/1652 land + this
PR lands):
- Tag release `v0.33.0` on main
- Cascade publish to crates.io (per memory project_ship_two_001_v0_32_0_release.md
  — 15 user-facing crates + 7 internal-tier in topological dependency
  order; uses `make publish CRATE=<name>`)
- Post-publish QA per `feedback_post_publish_qa_required.md` —
  `cargo install aprender --force` + `/dogfood` GO verdict required
  before declaring release done (v0.31.1 was yanked for skipping this)
- GitHub Release with §75 narrative
- HF artifact verification (paiml/qwen2.5-coder-7b-apache-q4k-v1 sha256
  already verified by §72 SHIP-010 LIVE evidence; double-check before
  release announcement)

This PR ships ONLY the version-bump + CHANGELOG. Publishing is the
next step after merge.

Refs:
- §75 MODEL-1 100% (PR #1652)
- §74 SHIP-007 bug localized (PR #1650)
- §73 SHIP-007 cascade reduction (PR #1647)
- §72 5-AC LIVE cascade (PR #1646)
- §71 SHIP-005 LIVE-DISCHARGED (PR #1642)
- §70 RC3 fix (PR #1636)
- §69 Q4K hypothesis falsified (PR #1633)
- PR #1635 RC3 prepend
- PR #1634 diagnostic surface + contract
- PR #1648 SHIP-007 contract scaffold
- PR #1649 SHIP-007 PR-B stage dump
- PR #1651 SHIP-007 PR-E F32 GEMV layout fix

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 4209a39 into main May 14, 2026
10 checks passed
@noahgift noahgift deleted the docs/section-75-model-1-100-percent branch May 14, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant