Skip to content

contracts(ccpa): v1.28.0 → v1.30.0 — register CCPA-018 + soft-deprecate CCPA-008#1735

Merged
noahgift merged 1 commit into
mainfrom
m232-ccpa-v1.30.0
May 17, 2026
Merged

contracts(ccpa): v1.28.0 → v1.30.0 — register CCPA-018 + soft-deprecate CCPA-008#1735
noahgift merged 1 commit into
mainfrom
m232-ccpa-v1.30.0

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Three changes bundled into v1.30.0. v1.29.0 SKIPPED because aprender#1705 (the original v1.29.0 PR) auto-CLOSED when its base #1684 squash-merged and deleted its feature branch. This PR realigns aprender main with companion's contract content (which has been at v1.29.0 since companion-repo M208) AND adds the M224/M230 deltas the operator-dispatched Phase 5 Arena bench produced.

What changed

(1) FALSIFY-CCPA-018 (arena_recovery_rate_bound) added at status: PROPOSED. Gate count: 17 → 18. Asserts `recovery_rate >= 0.5 AND oracle_passed_rate >= 0.3` on the M182 5-fixture project-scale corpus via the live multi-turn Arena harness. Measures agent quality (recovery), distinct from CCPA-016/017's functional outcome.

(2) FALSIFY-CCPA-008 (parity_score_bound) soft-deprecated to status: ADVISORY in its summary. Gate STILL enforces `>= 0.95` on 30 AUTHORED canonical fixtures — only the interpretation flipped. Reframed from system-level parity validation → meter validation. The system-level interpretation was empirically FALSIFIED by the M224 Arena bench.

(3) status_history entry records the M224 verdict. Operator dispatched `phase-5-arena-bench.sh` three times against M182 corpus. All three runs: oracle_passed_rate = 0.0000 for BOTH `claude` AND `apr code`. Verdict: `evaluate_static_vs_arena(1.0, 0.0, ...)` → `FalsifierOutcome::StaticFalsified` per design-audit.md §5.

Important nuance

0/5 for BOTH systems means neither solves these specific tasks under this harness — that's an Axis 2 closure CEILING, not a teacher-vs-student gap. The Popperian comparator is deterministic; if cleaner re-runs (post aprender#1712 fix) lift oracle/recovery scores, the verdict revises automatically.

Cross-references

Verification

  • `pv validate contracts/claude-code-parity-apr-v1.yaml` → 0 errors, 0 warnings
  • CI `ci/gate` GREEN
  • CI `workspace-test` GREEN
  • Companion-repo M22 5-step ritual pin.lock refresh after this merges to aprender main

Test plan

Pure additive bump (CCPA-018) + interpretation amendment (CCPA-008 summary text) + history record (M224). No schema change. No existing gate behavior touched.

🤖 Generated with Claude Code

…te CCPA-008

THREE changes bundled. v1.29.0 is SKIPPED — aprender#1705 (the original
v1.29.0 PR) auto-CLOSED when its base #1684 (v1.28.0) squash-merged
and deleted its feature branch. Companion-repo has been at the v1.29.0
contract YAML since M208 (pin.lock pointed at #1705's feature-branch
HEAD); this v1.30.0 upstream-flip realigns aprender main with companion's
contract content AND adds the M224/M230 deltas the operator-dispatched
Phase 5 Arena bench produced.

CHANGE (1): FALSIFY-CCPA-018 (arena_recovery_rate_bound) added to gate
registry at status: PROPOSED. Gate count: 17 → 18. Asserts
recovery_rate >= 0.5 AND oracle_passed_rate >= 0.3 on the M182 5-fixture
project-scale corpus driven via the live multi-turn Arena harness
(crates/ccpa-arena/, companion-repo M196-M210). Measures AGENT QUALITY
(does the agent recover when bash fails?), distinct from CCPA-016/017
which measure FUNCTIONAL OUTCOME. The asymmetric give-up-fast synthetic
fixture (100% pass BUT zero recovery → FAILS recovery floor) is the
canonical R3 distinguishing test.

CHANGE (2): FALSIFY-CCPA-008 (parity_score_bound) soft-deprecated to
status: ADVISORY in its summary. Gate STILL enforces aggregate >= 0.95,
per-fixture >= 0.80 on the 30 AUTHORED canonical fixtures — only the
INTERPRETATION flipped. Reframed from SYSTEM-LEVEL parity validation
("apr code matches claude on real engineering tasks") → METER VALIDATION
("the differ + scorer + per-tool equivalence rules correctly recognize
equivalent traces"). The system-level interpretation was empirically
FALSIFIED by the M224 Arena bench (see CHANGE (3)). Foreground parity
claims move to CCPA-016 (function-scale) + CCPA-017 (project-scale,
PROPOSED) + CCPA-018 (Arena recovery-rate, PROPOSED).

CHANGE (3): Records M224 first-operator-dispatched Phase 5 Arena bench
result in status_history. Operator ran scripts/phase-5-arena-bench.sh
against the M182 5-fixture project-scale corpus three times:
  - Run 1 (180s/turn) was noisy (6/10 timeout-killed)
  - Run 2 (600s/turn, 2400s wall) — clean teacher, 4/5 student
    apr-serve errors
  - Run 3 (post-aprender#1712 workaround M228) — 2/5 student cleanly
    completed 20 turns
All three: oracle_passed_rate = 0.0000 for BOTH systems. recovery_rate
= 0 for both. Verdict: evaluate_static_vs_arena(1.0, 0.0, ...) →
FalsifierOutcome::StaticFalsified.

Important nuance preserved in the status_history reason field: 0/5 for
BOTH systems means neither solves these specific tasks under this
harness — Axis 2 closure CEILING, not teacher-vs-student gap. The
Popperian comparator is deterministic; if a cleaner re-run (post
aprender#1712 fix) lifts recovery_rate or oracle_passed_rate, the
verdict revises automatically.

Cross-references in this PR:
- companion-repo M194-M210 = Phase 5 P5.1-P5.5 + coverage closure
- companion-repo M208 (PR #195) = the now-obsolete v1.29.0 mirror
- companion-repo M224 (PR #211) = evidence + headline revision
- companion-repo M226 (PR #213) = aprender#1712 + pkill workaround
- companion-repo M230 (PR #215) = soft-deprecation spec rewrite +
  new docs/specifications/static-fixture-deprecation.md (~140 lines)
- aprender#1712 = apr serve subprocess leak (root cause of the 3
  remaining student driver_errors in Run 3)

Tentative threshold values (CCPA-017: 0.3/0.3; CCPA-018: 0.5/0.3) WILL
be recalibrated after a cleaner re-run post-aprender#1712 upstream fix.

`pv validate contracts/claude-code-parity-apr-v1.yaml` → 0 errors, 0
warnings. Pure additive bump (CCPA-018) + interpretation amendment
(CCPA-008) + history record (M224). No schema change, no existing gate
behavior touched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 17, 2026 07:07
@noahgift noahgift merged commit cf5dfda into main May 17, 2026
11 checks passed
@noahgift noahgift deleted the m232-ccpa-v1.30.0 branch May 17, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant