contracts(ccpa): v1.28.0 → v1.30.0 — register CCPA-018 + soft-deprecate CCPA-008 by noahgift · Pull Request #1735 · paiml/aprender

noahgift · 2026-05-17T06:56:57Z

Summary

Three changes bundled into v1.30.0. v1.29.0 SKIPPED because aprender#1705 (the original v1.29.0 PR) auto-CLOSED when its base #1684 squash-merged and deleted its feature branch. This PR realigns aprender main with companion's contract content (which has been at v1.29.0 since companion-repo M208) AND adds the M224/M230 deltas the operator-dispatched Phase 5 Arena bench produced.

What changed

(1) FALSIFY-CCPA-018 (arena_recovery_rate_bound) added at status: PROPOSED. Gate count: 17 → 18. Asserts `recovery_rate >= 0.5 AND oracle_passed_rate >= 0.3` on the M182 5-fixture project-scale corpus via the live multi-turn Arena harness. Measures agent quality (recovery), distinct from CCPA-016/017's functional outcome.

(2) FALSIFY-CCPA-008 (parity_score_bound) soft-deprecated to status: ADVISORY in its summary. Gate STILL enforces `>= 0.95` on 30 AUTHORED canonical fixtures — only the interpretation flipped. Reframed from system-level parity validation → meter validation. The system-level interpretation was empirically FALSIFIED by the M224 Arena bench.

(3) status_history entry records the M224 verdict. Operator dispatched `phase-5-arena-bench.sh` three times against M182 corpus. All three runs: oracle_passed_rate = 0.0000 for BOTH `claude` AND `apr code`. Verdict: `evaluate_static_vs_arena(1.0, 0.0, ...)` → `FalsifierOutcome::StaticFalsified` per design-audit.md §5.

Important nuance

0/5 for BOTH systems means neither solves these specific tasks under this harness — that's an Axis 2 closure CEILING, not a teacher-vs-student gap. The Popperian comparator is deterministic; if cleaner re-runs (post aprender#1712 fix) lift oracle/recovery scores, the verdict revises automatically.

Cross-references

companion-repo M194-M210 (PRs REGRESSION: Format conversion still produces large diffs after #177 fix #181 → SafeTensors inference produces garbage: layer count misdetection (14 vs 24) #197) = Phase 5 P5.1-P5.5 + coverage closure
companion-repo M208 (PR apr tensors displays only 100 tensors from APR v2 file containing 291 #195) = now-obsolete v1.29.0 mirror
companion-repo M224 (PR feat(pull): MVP qualification blocked — APR and GGUF formats unavailable for pulled models #211) = evidence + headline revision
companion-repo M226 (PR apr pull fails for Qwen2.5-Coder models >= 3B #213) = aprender#1712 + pkill workaround
companion-repo M230 (PR Conversion output differences exceed acceptable tolerance #215) = soft-deprecation spec rewrite + new `docs/specifications/static-fixture-deprecation.md`
aprender#1712 = apr serve subprocess leak (root cause of remaining student driver_errors)

Verification

`pv validate contracts/claude-code-parity-apr-v1.yaml` → 0 errors, 0 warnings
CI `ci/gate` GREEN
CI `workspace-test` GREEN
Companion-repo M22 5-step ritual pin.lock refresh after this merges to aprender main

Test plan

Pure additive bump (CCPA-018) + interpretation amendment (CCPA-008 summary text) + history record (M224). No schema change. No existing gate behavior touched.

🤖 Generated with Claude Code

…te CCPA-008 THREE changes bundled. v1.29.0 is SKIPPED — aprender#1705 (the original v1.29.0 PR) auto-CLOSED when its base #1684 (v1.28.0) squash-merged and deleted its feature branch. Companion-repo has been at the v1.29.0 contract YAML since M208 (pin.lock pointed at #1705's feature-branch HEAD); this v1.30.0 upstream-flip realigns aprender main with companion's contract content AND adds the M224/M230 deltas the operator-dispatched Phase 5 Arena bench produced. CHANGE (1): FALSIFY-CCPA-018 (arena_recovery_rate_bound) added to gate registry at status: PROPOSED. Gate count: 17 → 18. Asserts recovery_rate >= 0.5 AND oracle_passed_rate >= 0.3 on the M182 5-fixture project-scale corpus driven via the live multi-turn Arena harness (crates/ccpa-arena/, companion-repo M196-M210). Measures AGENT QUALITY (does the agent recover when bash fails?), distinct from CCPA-016/017 which measure FUNCTIONAL OUTCOME. The asymmetric give-up-fast synthetic fixture (100% pass BUT zero recovery → FAILS recovery floor) is the canonical R3 distinguishing test. CHANGE (2): FALSIFY-CCPA-008 (parity_score_bound) soft-deprecated to status: ADVISORY in its summary. Gate STILL enforces aggregate >= 0.95, per-fixture >= 0.80 on the 30 AUTHORED canonical fixtures — only the INTERPRETATION flipped. Reframed from SYSTEM-LEVEL parity validation ("apr code matches claude on real engineering tasks") → METER VALIDATION ("the differ + scorer + per-tool equivalence rules correctly recognize equivalent traces"). The system-level interpretation was empirically FALSIFIED by the M224 Arena bench (see CHANGE (3)). Foreground parity claims move to CCPA-016 (function-scale) + CCPA-017 (project-scale, PROPOSED) + CCPA-018 (Arena recovery-rate, PROPOSED). CHANGE (3): Records M224 first-operator-dispatched Phase 5 Arena bench result in status_history. Operator ran scripts/phase-5-arena-bench.sh against the M182 5-fixture project-scale corpus three times: - Run 1 (180s/turn) was noisy (6/10 timeout-killed) - Run 2 (600s/turn, 2400s wall) — clean teacher, 4/5 student apr-serve errors - Run 3 (post-aprender#1712 workaround M228) — 2/5 student cleanly completed 20 turns All three: oracle_passed_rate = 0.0000 for BOTH systems. recovery_rate = 0 for both. Verdict: evaluate_static_vs_arena(1.0, 0.0, ...) → FalsifierOutcome::StaticFalsified. Important nuance preserved in the status_history reason field: 0/5 for BOTH systems means neither solves these specific tasks under this harness — Axis 2 closure CEILING, not teacher-vs-student gap. The Popperian comparator is deterministic; if a cleaner re-run (post aprender#1712 fix) lifts recovery_rate or oracle_passed_rate, the verdict revises automatically. Cross-references in this PR: - companion-repo M194-M210 = Phase 5 P5.1-P5.5 + coverage closure - companion-repo M208 (PR #195) = the now-obsolete v1.29.0 mirror - companion-repo M224 (PR #211) = evidence + headline revision - companion-repo M226 (PR #213) = aprender#1712 + pkill workaround - companion-repo M230 (PR #215) = soft-deprecation spec rewrite + new docs/specifications/static-fixture-deprecation.md (~140 lines) - aprender#1712 = apr serve subprocess leak (root cause of the 3 remaining student driver_errors in Run 3) Tentative threshold values (CCPA-017: 0.3/0.3; CCPA-018: 0.5/0.3) WILL be recalibrated after a cleaner re-run post-aprender#1712 upstream fix. `pv validate contracts/claude-code-parity-apr-v1.yaml` → 0 errors, 0 warnings. Pure additive bump (CCPA-018) + interpretation amendment (CCPA-008) + history record (M224). No schema change, no existing gate behavior touched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 17, 2026 07:07

noahgift merged commit cf5dfda into main May 17, 2026
11 checks passed

noahgift deleted the m232-ccpa-v1.30.0 branch May 17, 2026 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contracts(ccpa): v1.28.0 → v1.30.0 — register CCPA-018 + soft-deprecate CCPA-008#1735

contracts(ccpa): v1.28.0 → v1.30.0 — register CCPA-018 + soft-deprecate CCPA-008#1735
noahgift merged 1 commit into
mainfrom
m232-ccpa-v1.30.0

noahgift commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 17, 2026

Summary

What changed

Important nuance

Cross-references

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant