diff --git a/contracts/claude-code-parity-apr-v1.yaml b/contracts/claude-code-parity-apr-v1.yaml
index 4ff4d0765..7ec7243e3 100644
--- a/contracts/claude-code-parity-apr-v1.yaml
+++ b/contracts/claude-code-parity-apr-v1.yaml
@@ -63,8 +63,8 @@ metadata:
     - crates/aprender-orchestrate/contracts/batuta/apr-code-v1.yaml
 
 name: claude-code-parity-apr
-version: "1.28.0"
-status: ACTIVE_RUNTIME   # 17/17 gates registered; 4 with status: ACTIVE_RUNTIME (CCPA-013/014/015/016 — the runtime-evidence + outcome-parity track) + 1 with status: PROPOSED (CCPA-017 — project-scale parity, awaiting first operator-dispatched bench to flip ACTIVE_RUNTIME at v1.29.0), rest at PLANNED_M*/IN_REVIEW/HARD_BLOCKING_M16 per their lifecycle phase. No OPEN residue. v1.28.0 (companion-repo M180-M188 Phase 4 sequence, 2026-05-15) — adds FALSIFY-CCPA-017 (project_scale_parity_bound) to the gate registry. Phase 4 operationalizes the M159 ProgramBench prior-art (arXiv:2605.03546, 0%/200 SOTA baseline) into companion-tier project-scale parity testing: the M182 corpus draws 5 fixtures from real open GitHub issues across paiml/decy + paiml/bashrs + paiml/depyler with pinned pre-fix commit SHAs; the M184 runner (scripts/phase-4-bench.sh, 288 lines bash) clones at the pinned SHA, dispatches each system with timeout APR_TIMEOUT_S (default 900s), snapshots diff vs SHA, runs the per-fixture oracle_cmd; the M186 scorer (crates/ccpa-differ/src/project_scale_diff.rs, ~310 lines Rust) lifts the runner JSON into ProjectScaleParityReport with 5 derived metrics (per-fixture: approach_match + lines_edited_ratio; corpus-level: partial_agreement + files_jaccard_corpus + approach_match_rate); the M188 gate test (crates/ccpa-differ/tests/falsify_ccpa_017_project_scale_parity.rs, ~260 lines, 7 active + 1 #[ignore]'d) asserts partial_agreement >= 0.3 AND files_jaccard_corpus >= 0.3 with bidirectional sensitivity verified on synthetic identity (passes) and synthetic regression (fails) fixtures. CCPA-017 enters at status: PROPOSED because no operator-dispatched measurement has produced evidence/phase-4/project-scale-scores.json yet; the live-evidence test is #[ignore]'d until that exists. Threshold values (0.3/0.3) are tentative POC-tier floors — they WILL be recalibrated after first operator dispatch. Phase 4 is the SIGNAL regime, not the SATURATION regime: a CCPA-016-style "agreement = 1.0" result is implausible at project-scale per ProgramBench evidence; the goal is "do both systems make matching partial progress?" not "do both systems fully succeed?". v1.27.0 (companion-repo M167, 2026-05-14) — flips FALSIFY-CCPA-013 (first_recorded_parity_score) from `status: OPEN` → `status: ACTIVE_RUNTIME`. The gate's assertion has been satisfied since v1.1.0 (3 measured_parity blocks dating 2026-04-27 against `fixtures/canonical/` with aggregate_score = 1.0000), but the gate-level status field was never flipped — stale prose that this revision corrects. Also extends the assertion's `fixture_corpus_path` constraint to accept EITHER `fixtures/canonical/` (AUTHORED, since v1.2.0) OR `evidence/phase-3/captures/` (REAL-BINARY bilateral bench, companion-repo M150 — claude 2.1.139 + apr 0.32.0 + Qwen2.5-Coder-1.5B-Instruct-Q4_K_M, agreement = 1.0000 on MultiPL-E-Rust HumanEval/0..4). Adds a 4th measured_parity block under CCPA-013 recording M150's real-binary evidence as the strongest empirical discharge anchor. **CCPA-013 was the last gate stuck at `status: OPEN`** — its flip closes the OPEN residue. v1.26.0 (companion-repo M147+M152+M162 Phase 3 sequence, 2026-05-13) (companion-repo M147+M152+M162 Phase 3 sequence, 2026-05-13) — adds FALSIFY-CCPA-015 (ccpa_trace_subproc_output_purity) AND FALSIFY-CCPA-016 (outcome_parity_bound) to the gate registry. CCPA-015 was authored at M147 via provable-contract design (falsifying test FIRST, fix via Stdio::null()) for the ccpa-trace-subproc capture binary; PROPOSED in v1.25.0, promoted ACTIVE_RUNTIME here. CCPA-016 is the Phase 3 P3.4 outcome-parity gate authored at M152 — asserts aggregate agreement >= 0.5 on a MultiPL-E-Rust-class corpus with bidirectional sensitivity (synthetic regression fixture fails threshold; synthetic identity passes). CCPA-016 was empirically validated at M150 (real bilateral bench produced agreement = 1.0000 on 5/5 HumanEval/0..4 with real claude 2.1.139 + real apr code 0.32.0 via Qwen2.5-Coder-1.5B-Instruct-Q4_K_M). The companion-repo M162 row records that aprender#1638 MERGED upstream at squash b61b76b4 (2026-05-13), un-gating apr code from `--features code` so `cargo install apr-cli` ships it by default — the Axis 3 LlmDriver-adapter discharge is FULLY confirmed. v1.25.0 (companion-repo M136-M140 axis-2-closure-plan sequence, 2026-05-11) — adds FALSIFY-CCPA-014 (companion-repo M136-M140 axis-2-closure-plan sequence, 2026-05-11) — adds FALSIFY-CCPA-014 (os_event_parity_bound) to the gate registry, completing the axis-2 closure-plan idea (2) CLI subprocess instrumentation track. New gate consumes ccpa_subproc::OsEvent records (M136) via ccpa_differ::os_event_parity (M137) and asserts canonical-corpus score >= 0.95 + bidirectional sensitivity on regression corpus (M139). v1.24.0 (companion-repo M128-M131 sequence, 2026-05-10) — bumped from v1.23.0 to integrate the M109 cosine-vs-HF-FP16 LIVE-DISCHARGE (cos_sim 0.995384 ≥ 0.99 on lambda-vector RTX 4090, 2026-05-09; aprender PR #1597 squash 3fb04ef86 flipped `qwen3-moe-forward-v1` v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME). Discharges the v1.23.0 status-prose claim "Cosine vs HF FP16 remains operator-confirm pending ~60 GB HF download" — the FP16 weights had been on lambda-vector at /mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct/ (57 GB / 16 safetensors shards) for ~7 days; the "60 GB download" blocker was stale by 62 days. v1.23.0 (M35 M32d discharge audit-trail bump) records the 4-bug stack landed on aprender main as commit 5235aaeb9 (#1228) plus diagnostic surface PRs #1222 (Step 2), #1226 (Step 2.5), #1401 (Step 2 JSON wire). M32d gibberish output ("%%%%%%%%") converted to coherent English answers across math/geography/translation/code domains. M34 FAST PATH 5-whys plan delivered at lucky-case bound (5 substantive PRs vs 4-6 estimated, ~6 hours wall vs 2-3 days). Component priors verified empirically: rank-3 Q/K RMSNorm (15%) + rank-4 rope_theta (10%) + chat template both correct. Cosine vs HF FP16 formal flip **DISCHARGED 2026-05-09 at companion-repo M109** (apr_argmax = hf_argmax = 3555 " What"; 555ms apr-forward; HF FP16 fixture generated in 52s).
+version: "1.30.0"
+status: ACTIVE_RUNTIME   # 18/18 gates registered; 4 with status: ACTIVE_RUNTIME (CCPA-013/014/015/016 — the runtime-evidence + outcome-parity track) + 2 with status: PROPOSED (CCPA-017 project-scale parity + CCPA-018 Arena recovery-rate, both awaiting first operator-dispatched bench to lift oracle/recovery scores above the gate thresholds) + 1 with status: ADVISORY (CCPA-008 — soft-deprecated at companion-repo M230 / 2026-05-16, reframed from system-level parity validation to METER validation per the M224 design-audit.md §5 StaticFalsified Popperian verdict; gate STILL enforces ≥0.95 on the 30 AUTHORED canonical fixtures, but the 1.0000 result is interpreted as "the differ correctly recognizes equivalent traces" NOT "apr code matches claude on real engineering tasks" — see companion-repo docs/specifications/static-fixture-deprecation.md for full audit trail), rest at PLANNED_M*/IN_REVIEW/HARD_BLOCKING_M16 per their lifecycle phase. No OPEN residue. v1.30.0 (companion-repo M224-M230 Phase 5 Popperian-verdict sequence, 2026-05-16) — three changes: (1) FALSIFY-CCPA-018 (arena_recovery_rate_bound) added to the gate registry at status: PROPOSED (was supposed to ship as v1.29.0 via aprender#1705 but that PR auto-closed when its base aprender#1684/v1.28.0 squash-merged-and-deleted its feature branch; v1.29.0 is therefore SKIPPED — v1.28.0 → v1.30.0 directly). The v1.29.0-narrative content (Phase 5 P5.1-P5.5 companion-repo work M194-M206) is preserved verbatim in this status comment below for audit-trail continuity. (2) FALSIFY-CCPA-008 (parity_score_bound) soft-deprecated — annotated with status: ADVISORY in its summary + new semantic_change_log entry citing the M224 evidence + M230 reframe; gate's threshold (≥0.95 aggregate, ≥0.80 per-fixture) unchanged. (3) New status_history entry recording the M224 first-operator-dispatched-Arena-bench result: 0/5 oracle_passed_rate for BOTH claude AND apr code on the M182 project-scale corpus; design-audit.md §5 Popperian verdict StaticFalsified; aprender#1712 filed for apr-serve subprocess leak; M226 + M228 + M230 sequence on companion. v1.29.0 (SKIPPED — see v1.30.0 rationale above; companion-repo M194-M206 Phase 5 sequence, 2026-05-15) — adds FALSIFY-CCPA-018 (arena_recovery_rate_bound) to the gate registry. Phase 5 operationalizes design-audit.md (M192 operator-authored) R2 + R3 recommendations: a live multi-turn execution harness (crates/ccpa-arena/) where the agent gets bash/test feedback per turn and must recover from failures. The M196 P5.1 scaffolding shipped the ArenaSession + ArenaDriver + OracleCmd + TurnRecord types; M200 P5.2 shipped the real multi-turn loop body (crates/ccpa-arena/src/dispatch.rs with Bash/Read/Write/Edit dispatch via std::process::Command + std::fs); M202 P5.3 shipped SubprocessDriver + bin/ccpa-arena-bench (clap CLI) + scripts/phase-5-arena-bench.sh (operator-dispatch wrapper analogous to phase-4-bench.sh); M204 P5.4 shipped CCPA-018 gate test (asserts recovery_rate >= 0.5 AND oracle_passed_rate >= 0.3 with bidirectional sensitivity verified via synthetic identity/regression/give-up-fast fixtures — the asymmetric give-up-fast test is the canonical R3 distinguishing case: 100% pass rate BUT zero recovery FAILS the gate); M206 P5.5 shipped the falsifier-of-falsifier comparator (crates/ccpa-arena/src/falsifier.rs + evaluate_static_vs_arena() returning FalsifierVerdict with StaticFalsified/StaticValidated/Inconclusive outcomes per design-audit.md §5's Popperian test). CCPA-018 enters at status: PROPOSED because no operator-dispatched Arena bench has produced evidence/phase-5/arena-scores.json yet; the live-evidence test is #[ignore]'d until that exists. Threshold values (0.5 recovery / 0.3 oracle) are tentative POC-tier floors — they WILL be recalibrated after first operator dispatch. Phase 5 is distinct from Phase 4 (CCPA-017): CCPA-017 measures FUNCTIONAL OUTCOME (does the code work?); CCPA-018 measures AGENT QUALITY (does the agent recover when bash fails?). v1.28.0 (companion-repo M180-M188 Phase 4 sequence, 2026-05-15) — adds FALSIFY-CCPA-017 (project_scale_parity_bound) to the gate registry. Phase 4 operationalizes the M159 ProgramBench prior-art (arXiv:2605.03546, 0%/200 SOTA baseline) into companion-tier project-scale parity testing: the M182 corpus draws 5 fixtures from real open GitHub issues across paiml/decy + paiml/bashrs + paiml/depyler with pinned pre-fix commit SHAs; the M184 runner (scripts/phase-4-bench.sh, 288 lines bash) clones at the pinned SHA, dispatches each system with timeout APR_TIMEOUT_S (default 900s), snapshots diff vs SHA, runs the per-fixture oracle_cmd; the M186 scorer (crates/ccpa-differ/src/project_scale_diff.rs, ~310 lines Rust) lifts the runner JSON into ProjectScaleParityReport with 5 derived metrics (per-fixture: approach_match + lines_edited_ratio; corpus-level: partial_agreement + files_jaccard_corpus + approach_match_rate); the M188 gate test (crates/ccpa-differ/tests/falsify_ccpa_017_project_scale_parity.rs, ~260 lines, 7 active + 1 #[ignore]'d) asserts partial_agreement >= 0.3 AND files_jaccard_corpus >= 0.3 with bidirectional sensitivity verified on synthetic identity (passes) and synthetic regression (fails) fixtures. CCPA-017 enters at status: PROPOSED because no operator-dispatched measurement has produced evidence/phase-4/project-scale-scores.json yet; the live-evidence test is #[ignore]'d until that exists. Threshold values (0.3/0.3) are tentative POC-tier floors — they WILL be recalibrated after first operator dispatch. Phase 4 is the SIGNAL regime, not the SATURATION regime: a CCPA-016-style "agreement = 1.0" result is implausible at project-scale per ProgramBench evidence; the goal is "do both systems make matching partial progress?" not "do both systems fully succeed?". v1.27.0 (companion-repo M167, 2026-05-14) — flips FALSIFY-CCPA-013 (first_recorded_parity_score) from `status: OPEN` → `status: ACTIVE_RUNTIME`. The gate's assertion has been satisfied since v1.1.0 (3 measured_parity blocks dating 2026-04-27 against `fixtures/canonical/` with aggregate_score = 1.0000), but the gate-level status field was never flipped — stale prose that this revision corrects. Also extends the assertion's `fixture_corpus_path` constraint to accept EITHER `fixtures/canonical/` (AUTHORED, since v1.2.0) OR `evidence/phase-3/captures/` (REAL-BINARY bilateral bench, companion-repo M150 — claude 2.1.139 + apr 0.32.0 + Qwen2.5-Coder-1.5B-Instruct-Q4_K_M, agreement = 1.0000 on MultiPL-E-Rust HumanEval/0..4). Adds a 4th measured_parity block under CCPA-013 recording M150's real-binary evidence as the strongest empirical discharge anchor. **CCPA-013 was the last gate stuck at `status: OPEN`** — its flip closes the OPEN residue. v1.26.0 (companion-repo M147+M152+M162 Phase 3 sequence, 2026-05-13) (companion-repo M147+M152+M162 Phase 3 sequence, 2026-05-13) — adds FALSIFY-CCPA-015 (ccpa_trace_subproc_output_purity) AND FALSIFY-CCPA-016 (outcome_parity_bound) to the gate registry. CCPA-015 was authored at M147 via provable-contract design (falsifying test FIRST, fix via Stdio::null()) for the ccpa-trace-subproc capture binary; PROPOSED in v1.25.0, promoted ACTIVE_RUNTIME here. CCPA-016 is the Phase 3 P3.4 outcome-parity gate authored at M152 — asserts aggregate agreement >= 0.5 on a MultiPL-E-Rust-class corpus with bidirectional sensitivity (synthetic regression fixture fails threshold; synthetic identity passes). CCPA-016 was empirically validated at M150 (real bilateral bench produced agreement = 1.0000 on 5/5 HumanEval/0..4 with real claude 2.1.139 + real apr code 0.32.0 via Qwen2.5-Coder-1.5B-Instruct-Q4_K_M). The companion-repo M162 row records that aprender#1638 MERGED upstream at squash b61b76b4 (2026-05-13), un-gating apr code from `--features code` so `cargo install apr-cli` ships it by default — the Axis 3 LlmDriver-adapter discharge is FULLY confirmed. v1.25.0 (companion-repo M136-M140 axis-2-closure-plan sequence, 2026-05-11) — adds FALSIFY-CCPA-014 (companion-repo M136-M140 axis-2-closure-plan sequence, 2026-05-11) — adds FALSIFY-CCPA-014 (os_event_parity_bound) to the gate registry, completing the axis-2 closure-plan idea (2) CLI subprocess instrumentation track. New gate consumes ccpa_subproc::OsEvent records (M136) via ccpa_differ::os_event_parity (M137) and asserts canonical-corpus score >= 0.95 + bidirectional sensitivity on regression corpus (M139). v1.24.0 (companion-repo M128-M131 sequence, 2026-05-10) — bumped from v1.23.0 to integrate the M109 cosine-vs-HF-FP16 LIVE-DISCHARGE (cos_sim 0.995384 ≥ 0.99 on lambda-vector RTX 4090, 2026-05-09; aprender PR #1597 squash 3fb04ef86 flipped `qwen3-moe-forward-v1` v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME). Discharges the v1.23.0 status-prose claim "Cosine vs HF FP16 remains operator-confirm pending ~60 GB HF download" — the FP16 weights had been on lambda-vector at /mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct/ (57 GB / 16 safetensors shards) for ~7 days; the "60 GB download" blocker was stale by 62 days. v1.23.0 (M35 M32d discharge audit-trail bump) records the 4-bug stack landed on aprender main as commit 5235aaeb9 (#1228) plus diagnostic surface PRs #1222 (Step 2), #1226 (Step 2.5), #1401 (Step 2 JSON wire). M32d gibberish output ("%%%%%%%%") converted to coherent English answers across math/geography/translation/code domains. M34 FAST PATH 5-whys plan delivered at lucky-case bound (5 substantive PRs vs 4-6 estimated, ~6 hours wall vs 2-3 days). Component priors verified empirically: rank-3 Q/K RMSNorm (15%) + rank-4 rope_theta (10%) + chat template both correct. Cosine vs HF FP16 formal flip **DISCHARGED 2026-05-09 at companion-repo M109** (apr_argmax = hf_argmax = 3555 " What"; 555ms apr-forward; HF FP16 fixture generated in 52s).
 
 # ─────────────────────────────────────────────────────────────────────────────
 # Top-level invariants — the 12 falsifiable gates this contract asserts.
@@ -85,12 +85,13 @@ invariants:
   - { id: FALSIFY-CCPA-005, name: file_mutation_equivalence,  summary: 'CWD diff after replay matches CWD diff after teacher run' }
   - { id: FALSIFY-CCPA-006, name: sovereignty_on_replay,      summary: 'no outbound api.anthropic.com sockets during replay' }
   - { id: FALSIFY-CCPA-007, name: corpus_coverage,            summary: '>=1 fixture per non-MISSING row of apr-code-parity-v1.yaml' }
-  - { id: FALSIFY-CCPA-008, name: parity_score_bound,         summary: 'aggregate parity_score >= 0.95, per-fixture >= 0.80' }
+  - { id: FALSIFY-CCPA-008, name: parity_score_bound,         summary: 'aggregate parity_score >= 0.95, per-fixture >= 0.80. STATUS: ADVISORY (soft-deprecated at companion-repo M230 / 2026-05-16). Gate still enforces the score threshold on the 30 AUTHORED canonical fixtures, but the 1.0000 result is now interpreted as METER VALIDATION (the differ + scorer + per-tool equivalence rules correctly recognize equivalent traces) NOT SYSTEM-LEVEL parity (apr code matches claude on real engineering tasks). The system-level parity claim was empirically falsified by the M224 first-operator-dispatched Phase 5 Arena bench (oracle_passed_rate = 0.0000 on 5/5 M182 project-scale fixtures for BOTH systems). Foreground user-facing parity claims move to CCPA-016 (function-scale outcome) + CCPA-017 (project-scale partial-progress, PROPOSED) + CCPA-018 (Arena recovery-rate, PROPOSED). See companion-repo docs/specifications/static-fixture-deprecation.md for full audit trail M0 → M230.' }
   - { id: FALSIFY-CCPA-013, name: first_recorded_parity_score, summary: 'AT LEAST ONE real Claude Code ↔ apr code corpus run produced a measured parity_score recorded in status_history. Flips ACTIVE_ALGORITHM_LEVEL → ACTIVE_RUNTIME.' }
   - { id: FALSIFY-CCPA-014, name: os_event_parity_bound,       summary: 'OS-level event parity (axis-2-closure-plan M115.4): macro-averaged Jaccard >= 0.95 per fixture in fixtures/os-canonical/; bidirectional-sensitivity gate on fixtures/os-regression/ (every fixture < 0.95 + non-empty drift records).' }
   - { id: FALSIFY-CCPA-015, name: ccpa_trace_subproc_output_purity, summary: 'Every line emitted to stdout by ccpa-trace-subproc MUST decode as a ccpa_subproc::OsEvent JSON object. Subprocess stdout MUST NOT interleave with the capture stream (use Stdio::null() not Stdio::inherit()).' }
   - { id: FALSIFY-CCPA-016, name: outcome_parity_bound,        summary: 'Outcome parity (Phase 3 P3.4): aggregate agreement on a MultiPL-E-Rust-class corpus >= 0.5 (POC-tier); bidirectional-sensitivity via synthetic regression (< 0.5 → fail) + synthetic identity (1.0 → pass) fixtures.' }
   - { id: FALSIFY-CCPA-017, name: project_scale_parity_bound,  summary: 'Project-scale parity (Phase 4 P4.4): aggregate partial_agreement >= 0.3 AND files_jaccard_corpus >= 0.3 on a multi-file Cargo-workspace task corpus drawn from real GitHub issues (companion-repo M182). Bidirectional-sensitivity via synthetic identity (passes) + synthetic regression (fails) fixtures. PROPOSED at v1.28.0; ACTIVE_RUNTIME pending first operator-dispatched measurement.' }
+  - { id: FALSIFY-CCPA-018, name: arena_recovery_rate_bound,   summary: 'Arena recovery-rate (Phase 5 P5.4): aggregate recovery_rate >= 0.5 AND oracle_passed_rate >= 0.3 on a multi-turn live Arena bench (companion-repo M196-M206). Measures AGENT QUALITY (does the agent recover from failed bash/test runs?) distinct from CCPA-016/017 functional-outcome metrics. Bidirectional-sensitivity via synthetic identity (passes) + regression (fails) + give-up-fast (asymmetric: 100% pass but zero recovery FAILS recovery floor — the canonical R3 distinguishing test). PROPOSED at v1.29.0; ACTIVE_RUNTIME pending first operator-dispatched Arena bench.' }
 
 scope: >
   every recorded fixture under <ccpa-repo>/fixtures/, every replay run the
@@ -927,6 +928,132 @@ falsification_conditions:
       - { date: '2026-05-15', version_before: '1.27.0', version_after: '1.28.0',
           change: "Added FALSIFY-CCPA-017 to gate registry at status: PROPOSED. Companion-repo M188 ships the gate test scaffold (7 synthetic-fixture tests + 1 #[ignore]'d live-evidence test); thresholds (partial_agreement >= 0.3 AND files_jaccard_corpus >= 0.3) are tentative POC-tier floors awaiting first operator-dispatched measurement to calibrate. Phase 4 P4.5 contract bump." }
 
+  - id: FALSIFY-CCPA-018
+    name: arena_recovery_rate_bound
+    status: PROPOSED
+    assertion: |
+      Arena recovery-rate (Phase 5 P5.4). On a multi-turn live Arena
+      bench against the M182 project-scale corpus (companion-repo
+      fixtures/project-scale/) where each task is driven through an
+      ArenaSession with up to max_turns=20 multi-turn dialog turns and
+      bash/test execution feedback per turn, the aggregate Arena scores
+      MUST satisfy BOTH:
+        - aggregate `recovery_rate` >= 0.5
+        - aggregate `oracle_passed_rate` >= 0.3
+
+      Where derived metrics are:
+        recovery_rate = (teacher_recovered + student_recovered) /
+                        (corpus_size * 2)
+        oracle_passed_rate = (teacher_passed + student_passed) /
+                             (corpus_size * 2)
+        recovery_observed = OraclePassed AND any_bash_failure_in_history
+                            (per side per fixture)
+
+      Plus consistency invariants:
+        - `corpus_size >= 3` (minimum sample size for statistical meaning)
+        - `corpus_size == per_fixture.len()` (record-count match)
+
+      Bidirectional sensitivity (mandatory):
+        - A synthetic identity fixture (all pass + all recovered) MUST
+          pass.
+        - A synthetic regression fixture (no pass, no recovery) MUST fail.
+        - A synthetic give-up-fast fixture (100% pass BUT zero recovery)
+          MUST fail on the recovery floor — this is the canonical R3
+          distinguishing test: a system that solves easy tasks zero-shot
+          but never recovers from a hard task's first failure is NOT
+          accepted by CCPA-018.
+        - An empty-corpus report MUST fail (prevents "no-data" from
+          being claimed as success).
+
+      Source of truth: `evidence/phase-5/arena-scores.json` produced
+      by `scripts/phase-5-arena-bench.sh` on the companion repo.
+
+      CCPA-018 measures AGENT QUALITY (does the agent recover?),
+      distinct from CCPA-016/017 which measure FUNCTIONAL OUTCOME
+      (does the code work?). Direct empirical answer to
+      design-audit.md §6 R3 "self-correction over zero-shot
+      determinism".
+    test_harness: |
+      `cargo test -p ccpa-arena --test falsify_ccpa_018_arena_recovery_rate`
+      runs 7 active assertions + 1 `#[ignore]`'d live-evidence assertion:
+        - synthetic_identity_corpus_passes_gate
+        - synthetic_regression_corpus_fails_gate
+        - synthetic_give_up_fast_fails_on_recovery_floor (THE canonical
+          R3 distinguishing test)
+        - empty_corpus_vacuously_fails_threshold
+        - exactly_at_thresholds_passes (verifies >= not >)
+        - just_below_recovery_threshold_fails (single-gate sensitivity)
+        - threshold_constants_match_plan (sentinel)
+        - live_evidence_meets_arena_recovery_threshold (#[ignore]'d
+          until operator dispatches `bash scripts/phase-5-arena-bench.sh`)
+
+      Plus the falsifier-of-falsifier comparator at
+      `cargo test -p ccpa-arena --test falsify_static_vs_arena`
+      (companion-repo M206 P5.5): 4 active synthetic tests + 1
+      `#[ignore]`'d live-evidence test that loads BOTH evidence files
+      (CCPA-016 + CCPA-018) and emits a `FalsifierVerdict` per
+      design-audit.md §5's Popperian test.
+
+      All 7 + 4 active GREEN on the companion-repo M206 scaffold
+      (synthetic fixtures constructed in-test, no on-disk corpus
+      dependency).
+    rationale: |
+      The M192 design audit (operator-authored, integrated into
+      companion spec at companion-repo M192) identified three tactical
+      recommendations for faster project-scale convergence:
+      (R1) soft-deprecate FALSIFY-CCPA-014; (R2) pivot to a live Arena
+      runner; (R3) prioritize error recovery over zero-shot determinism.
+      CCPA-018 operationalizes R3 explicitly: the recovery_rate metric
+      counts fixtures where the agent's earlier turn produced a
+      non-zero bash exit BUT the session continued and the oracle
+      eventually passed — the canonical "self-correction" signal.
+
+      The DUAL-threshold design (recovery_rate >= 0.5 AND
+      oracle_passed_rate >= 0.3) is intentional: recovery_rate alone
+      passes a "always fail with retry-on-error" agent (degenerate);
+      oracle_passed_rate alone passes a "always succeed zero-shot"
+      agent (fails the R3 framing). Both together require: agent makes
+      progress (oracle passes) AND agent recovers (oracle passes AFTER
+      bash failure). The asymmetric give-up-fast synthetic fixture
+      distinguishes CCPA-018 from CCPA-017: a system passing CCPA-017
+      (functional outcome) but failing CCPA-018 (zero recovery) is
+      empirically detected by the dual-floor predicate.
+
+      Threshold values (0.5/0.3) are tentative POC-tier floors. They
+      WILL be recalibrated after first operator-dispatched Arena bench
+      against the M182 5-fixture corpus.
+
+      Status PROPOSED (not ACTIVE_RUNTIME) because no operator-dispatched
+      Arena bench has produced evidence/phase-5/arena-scores.json yet.
+      The live-evidence test is `#[ignore]`'d until that file exists.
+      Once the operator runs `bash scripts/phase-5-arena-bench.sh` and
+      the gate passes against real data, a v1.30.0 bump will flip
+      PROPOSED → ACTIVE_RUNTIME.
+
+      Companion-repo Phase 5 sequence (M180-M206):
+        M180 (PR #167 squash c7107b9) — phase-5-arena-runner-plan.md
+          authored. P5.1-P5.5 sub-deliverables defined. Operationalizes
+          design-audit.md R2 + R3.
+        M192 (PR #179 squash d9ae48a) — design-audit.md integrated.
+        M196 (PR #183 squash 6a7fe39) — P5.1 Arena harness scaffolding
+          (crates/ccpa-arena/, 4 modules, 19 tests).
+        M200 (PR #187 squash 75ef8e6) — P5.2 multi-turn loop body
+          (crates/ccpa-arena/src/dispatch.rs with Bash/Read/Write/Edit
+          dispatch + run_oracle + render_history + 29 new tests).
+        M202 (PR #189 squash e381d05) — P5.3 Arena bench runner
+          (SubprocessDriver + bin/ccpa-arena-bench clap CLI +
+          scripts/phase-5-arena-bench.sh wrapper).
+        M204 (PR #191 squash aa58ed6) — P5.4 CCPA-018 gate test
+          scaffold (~230 LOC, 7 active synthetic tests + 1 ignored
+          live-evidence). Tentative thresholds 0.5/0.3.
+        M206 (PR #193 squash b95be66) — P5.5 falsifier-of-falsifier
+          (crates/ccpa-arena/src/falsifier.rs comparator +
+          evidence/phase-5/static-fixture-falsification.md template).
+          Phase 5 arc COMPLETE at substantive level.
+    semantic_change_log:
+      - { date: '2026-05-15', version_before: '1.28.0', version_after: '1.29.0',
+          change: "Added FALSIFY-CCPA-018 to gate registry at status: PROPOSED. Companion-repo M204 ships the gate test scaffold (7 synthetic-fixture tests + 1 #[ignore]'d live-evidence test); thresholds (recovery_rate >= 0.5 AND oracle_passed_rate >= 0.3) are tentative POC-tier floors awaiting first operator-dispatched Arena bench to calibrate. Phase 5 P5.5+ contract bump (P5.5 falsifier-of-falsifier shipped at M206)." }
+
   - id: FALSIFY-CCPA-008
     name: parity_score_bound
     status: PLANNED_M6
@@ -1085,6 +1212,278 @@ milestones:
 # ─────────────────────────────────────────────────────────────────────────────
 
 status_history:
+  - date: '2026-05-16'
+    from: 'ACTIVE_RUNTIME v1.28.0'
+    to: 'ACTIVE_RUNTIME v1.30.0'
+    note: 'companion-repo M224-M230 Phase 5 Popperian-verdict sequence — three changes: (1) FALSIFY-CCPA-018 (arena_recovery_rate_bound) added to gate registry at status: PROPOSED (v1.29.0 SKIPPED — see reason below); (2) FALSIFY-CCPA-008 (parity_score_bound) soft-deprecated to status: ADVISORY in its summary, threshold unchanged; (3) records M224 first-operator-dispatched Arena bench result (0/5 oracle_passed_rate for both systems on M182 project-scale corpus) + design-audit.md §5 Popperian verdict: StaticFalsified.'
+    reason: |
+      Three changes bundled because they collectively answer the
+      operator-authored design-audit.md §5 Popperian test and triggered
+      the spec-level reframe that the M224 evidence justified.
+
+      ─────────────────────────────────────────────────────────────────
+      (1) FALSIFY-CCPA-018 added at status: PROPOSED.
+      ─────────────────────────────────────────────────────────────────
+
+      Gate count: 17 → 18. This is the work that was supposed to ship
+      as v1.29.0 via aprender#1705, but that PR auto-CLOSED when its
+      base aprender#1684 (v1.28.0) squash-merged-and-deleted its
+      feature branch. v1.29.0 is therefore SKIPPED — v1.28.0 jumps
+      directly to v1.30.0. The v1.29.0-narrative content (Phase 5
+      P5.1-P5.5 companion-repo work M194-M206) is preserved verbatim
+      in this entry's "v1.29.0 (SKIPPED ...)" sub-section below.
+
+      Companion-repo Phase 5 sequence (M194-M210):
+
+        M194 (PR #181 squash 4011bea) — phase-5-arena-runner-plan.md
+          authored. P5.1-P5.5 sub-deliverables defined.
+        M196 (PR #183 squash 6a7fe39) — P5.1 Arena harness scaffolding
+          SHIPPED. New workspace crate `crates/ccpa-arena/` (7th member).
+          Type signatures for ArenaSession + ArenaDriver + OracleCmd +
+          TurnRecord + ToolInvocation + ToolResult.
+        M200 (PR #187 squash 75ef8e6) — P5.2 multi-turn loop body
+          SHIPPED. crates/ccpa-arena/src/dispatch.rs (~470 LOC) with
+          render_history, dispatch_tool_use (Bash/Read/Write/Edit with
+          real std::process::Command + std::fs + Sha256), run_oracle.
+          R3 framing validated via run_records_bash_failure_and_continues
+          test.
+        M202 (PR #189 squash e381d05) — P5.3 Arena bench runner SHIPPED.
+          crates/ccpa-arena/src/subprocess_driver.rs + bin/ccpa-arena-bench
+          (clap CLI) + scripts/phase-5-arena-bench.sh (~210 LOC bash).
+          Aggregates per-fixture results into evidence/phase-5/arena-
+          scores.json.
+        M204 (PR #191 squash aa58ed6) — P5.4 CCPA-018 gate test
+          SHIPPED. crates/ccpa-arena/tests/falsify_ccpa_018_arena_
+          recovery_rate.rs (~230 LOC, 7 active + 1 #[ignore]'d live-
+          evidence). Includes the asymmetric give-up-fast synthetic
+          fixture (100% pass BUT zero recovery FAILS — canonical R3
+          distinguishing test).
+        M206 (PR #193 squash b95be66) — P5.5 falsifier-of-falsifier
+          comparator SHIPPED. crates/ccpa-arena/src/falsifier.rs
+          (~140 LOC) — evaluate_static_vs_arena() implementing
+          design-audit.md §5's Popperian test as a deterministic
+          pure function. 3-variant outcome:
+            FalsifierOutcome::StaticFalsified  (static>=0.95 AND arena<=0.2)
+            FalsifierOutcome::StaticValidated  (static>=0.5 AND arena>=0.5)
+            FalsifierOutcome::Inconclusive { reason }
+          Thresholds: STATIC_PARITY_THRESHOLD=0.95, ARENA_PARITY_CEILING=0.2.
+        M208 (PR #195 squash 4c251dd) — companion-repo M22 5-step
+          ritual mirror of (the now-CLOSED) aprender#1705 v1.29.0
+          content. Companion main has been at the v1.29.0 contract YAML
+          + pin.lock pointing at #1705's feature-branch HEAD since this
+          M-row; this v1.30.0 upstream-flip realigns aprender main with
+          companion's contract content + adds the M224/M230 deltas.
+        M210 (PR #197 squash dca0de9) — ccpa-arena coverage closure.
+          Workspace 95.44% → 99.09% lines + 99.75% functions.
+          FALSIFY-CCPA-011 now passes on its own merits.
+
+      DUAL-threshold design (preserved): recovery_rate >= 0.5 AND
+      oracle_passed_rate >= 0.3. The asymmetric give-up-fast synthetic
+      fixture (100% pass but zero recovery → fails recovery floor) is
+      the canonical R3 distinguishing test that separates CCPA-018
+      from CCPA-017.
+
+      Tentative 0.5/0.3 POC-tier floors; recalibration awaits cleaner
+      operator-dispatched Arena bench (current data shows 0/5 for both
+      systems — see (3) below).
+
+      ─────────────────────────────────────────────────────────────────
+      (2) FALSIFY-CCPA-008 soft-deprecated to status: ADVISORY.
+      ─────────────────────────────────────────────────────────────────
+
+      Gate STILL enforces (aggregate >= 0.95, per-fixture >= 0.80
+      thresholds unchanged on the 30 AUTHORED canonical fixtures).
+      Interpretation flipped from SYSTEM-LEVEL parity validation
+      (implicit "apr code matches claude on real engineering tasks")
+      → METER VALIDATION (the differ + scorer + per-tool equivalence
+      rules correctly recognize equivalent traces).
+
+      The system-level interpretation was empirically FALSIFIED by
+      the M224 first-operator-dispatched Phase 5 Arena bench
+      (see (3) below): 0/5 oracle_passed_rate for BOTH claude AND
+      apr code on the M182 project-scale corpus. Static fixtures
+      over-predicted live-Arena results by infinity (1.0 → 0.0).
+      Per design-audit.md §5 the static-fixture approach is FALSIFIED
+      as a convergence predictor.
+
+      Foreground user-facing parity claims move to:
+        - CCPA-016 (function-scale outcome) — agreement = 1.0000 on
+          MultiPL-E-Rust HumanEval/0..4 (M150)
+        - CCPA-017 (project-scale partial-progress, PROPOSED) — awaits
+          first operator dispatch
+        - CCPA-018 (Arena recovery-rate, PROPOSED) — current M224
+          evidence: recovery_rate = 0.0 for both systems
+
+      Full audit trail M0 → M230: companion-repo
+      docs/specifications/static-fixture-deprecation.md.
+
+      ─────────────────────────────────────────────────────────────────
+      (3) M224 first-operator-dispatched Phase 5 Arena bench result.
+      ─────────────────────────────────────────────────────────────────
+
+      Records the empirical answer to design-audit.md §5's Popperian
+      test. Operator ran `bash scripts/phase-5-arena-bench.sh` against
+      the M182 5-fixture project-scale corpus (real GitHub issues
+      across paiml/decy + paiml/bashrs + paiml/depyler) three times:
+
+        Run 1 (180s/turn, 900s/fixture-system wall) — noisy.
+          6 of 10 dispatches killed by per-turn timeout.
+
+        Run 2 (600s/turn, 2400s/fixture-system wall) — clean.
+          teacher (claude 2.1.143): 5/5 ran full 20 turns within
+            wall budget. Zero timeout-kill artifacts.
+          student (apr 0.32.0 + qwen2.5-coder-1.5b): 4/5 hit
+            `apr serve` network errors mid-session, 1/5 (decy#39)
+            completed 20 turns clean.
+
+        Run 3 (post-aprender#1712 workaround, M228 — same config
+        as Run 2 + scripts/phase-5-arena-bench.sh § "Defensive
+        cleanup" runs `pkill -f "^apr serve"` between teacher and
+        student per fixture):
+          teacher: 5/5 ran full 20 turns.
+          student: 3/5 driver_error (apr-serve intra-fixture leak),
+            2/5 (decy#39 + decy#40) completed 20 turns clean.
+
+      Result across all three runs: oracle_passed_rate = 0.0000 (0/5)
+      for BOTH teacher AND student. recovery_rate = 0 for both.
+
+      Verdict: evaluate_static_vs_arena(1.0, 0.0,
+                 "evidence/phase-3/multipl-e-rust-scores.json#.agreement",
+                 "evidence/phase-5/arena-scores.json#.oracle_passed_rate")
+               → FalsifierOutcome::StaticFalsified.
+
+      Important nuance preserved in companion-repo
+      evidence/phase-5/static-fixture-falsification.md: 0/5 for BOTH
+      systems means neither solves these specific tasks under this
+      harness — that's an Axis 2 closure CEILING, not a teacher-vs-
+      student gap. The Phase 5 Arena harness (20-turn budget + 40-min
+      wall + the M182 fixture prompts) does not provide enough
+      scaffolding for either SOTA system to converge on a passing
+      oracle for these particular real GitHub issues. Possible
+      confounds: (a) apr serve network bug (aprender#1712); (b)
+      fixture difficulty (even claude itself doesn't solve them in
+      20 turns); (c) oracle strictness (`cargo test` / `cargo clippy
+      --all-targets -- -D warnings` is binary pass/fail).
+
+      Companion-repo M224-M230 sequence:
+
+        M224 (PR #211 squash 0c6b441) — evidence/phase-5/static-
+          fixture-falsification.md flipped TEMPLATE → RESOLVED; top
+          spec headline Axis 2 score revised down ~90% → ~55%.
+        M226 (PR #213 squash 7b28e89) — aprender#1712 filed (apr serve
+          subprocess leak) + scripts/phase-5-arena-bench.sh § defensive
+          `pkill -f "^apr serve"` added (default-on, opt-out via
+          PHASE5_APR_SERVE_CLEANUP=0).
+        M228 (inline in M230) — operator-dispatched re-run with the
+          M226 workaround; produced cleaner student data on 2 of 5
+          fixtures; same verdict 0/5.
+        M230 (PR #215 squash 881e8fa) — soft-deprecation spec rewrite:
+          new docs/specifications/static-fixture-deprecation.md
+          (~140 lines) + falsification-conditions.md § CCPA-008
+          annotated + top spec TOC row added.
+
+      Tentative threshold values (0.5 recovery / 0.3 oracle for
+      CCPA-018; 0.3 partial / 0.3 jaccard for CCPA-017) WILL be
+      recalibrated after a cleaner re-run post-aprender#1712 upstream
+      fix. The Popperian comparator (evaluate_static_vs_arena) is
+      deterministic: same data in → same verdict out. If recovery_rate
+      or oracle_passed_rate move materially in a future run, the
+      StaticFalsified verdict revises automatically without further
+      contract changes.
+
+      ─────────────────────────────────────────────────────────────────
+      v1.29.0 (SKIPPED — see (1) above) status-comment content,
+      preserved here for audit-trail continuity since v1.29.0 was
+      authored upstream as aprender#1705 then auto-closed when its
+      base #1684 squash-merged-and-deleted its feature branch.
+      Companion-repo had this content at M208 and continues to ship
+      it as-is post-M230.
+      ─────────────────────────────────────────────────────────────────
+
+  - date: '2026-05-15'
+    from: 'ACTIVE_RUNTIME v1.28.0'
+    to: 'ACTIVE_RUNTIME v1.29.0 (SKIPPED — bundled into v1.30.0 above)'
+    note: 'companion-repo M194-M206 Phase 5 sequence — FALSIFY-CCPA-018 (arena_recovery_rate_bound) added to gate registry at status: PROPOSED; awaits first operator-dispatched Arena bench to flip ACTIVE_RUNTIME'
+    reason: |
+      Adds 1 new falsification gate to the registry: CCPA-018
+      (Arena recovery-rate bound). Gate count: 17 → 18.
+
+      Phase 5 operationalizes design-audit.md (M192 operator-authored,
+      companion-repo) R2 + R3 recommendations: a live multi-turn
+      execution harness where the agent gets bash/test feedback per
+      turn and must recover from failures. CCPA-018 explicitly measures
+      AGENT QUALITY (does the agent recover when bash fails?), distinct
+      from CCPA-016/017 which measure FUNCTIONAL OUTCOME.
+
+      DUAL-threshold design: recovery_rate >= 0.5 AND
+      oracle_passed_rate >= 0.3. The asymmetric give-up-fast synthetic
+      fixture (100% pass but zero recovery → fails recovery floor) is
+      the canonical R3 distinguishing test that separates CCPA-018
+      from CCPA-017.
+
+      Tentative 0.5/0.3 POC-tier floors; recalibration awaits first
+      operator-dispatched Arena bench against M182 corpus.
+
+      Companion-repo Phase 5 sequence (M194-M206):
+
+        M194 (PR #181 squash 4011bea) — phase-5-arena-runner-plan.md
+          authored. P5.1-P5.5 sub-deliverables defined.
+
+        M196 (PR #183 squash 6a7fe39) — P5.1 Arena harness scaffolding.
+          New crate crates/ccpa-arena/ with ArenaSession + ArenaDriver
+          + OracleCmd + TurnRecord types. 19 unit tests.
+
+        M200 (PR #187 squash 75ef8e6) — P5.2 multi-turn loop body.
+          crates/ccpa-arena/src/dispatch.rs (~470 LOC) with real
+          subprocess execution: Bash via std::process::Command, Edit
+          via read+matches.count+replacen+write, Read/Write via
+          std::fs. 29 new tests. R3 recovery validated via
+          run_records_bash_failure_and_continues test.
+
+        M202 (PR #189 squash e381d05) — P5.3 Arena bench runner.
+          SubprocessDriver wraps agent CLI per turn with timeout.
+          New bin crates/ccpa-arena/src/bin/ccpa-arena-bench.rs (clap
+          CLI). New scripts/phase-5-arena-bench.sh wrapper analogous
+          to phase-4-bench.sh. recovery_observed semantic:
+          OraclePassed AND any_bash_failure_in_history.
+
+        M204 (PR #191 squash aa58ed6) — P5.4 CCPA-018 gate test.
+          crates/ccpa-arena/src/scores.rs typed shape +
+          tests/falsify_ccpa_018_arena_recovery_rate.rs (~230 LOC,
+          7 active synthetic + 1 ignored live-evidence). Tentative
+          thresholds 0.5/0.3.
+
+        M206 (PR #193 squash b95be66) — P5.5 falsifier-of-falsifier.
+          crates/ccpa-arena/src/falsifier.rs with
+          evaluate_static_vs_arena() returning FalsifierVerdict
+          (StaticFalsified / StaticValidated / Inconclusive) per
+          design-audit.md §5's Popperian test.
+          evidence/phase-5/static-fixture-falsification.md
+          operator-facing evidence template.
+
+      Gate-level statuses post-v1.29.0: 4 ACTIVE_RUNTIME (CCPA-013/
+      014/015/016) + 2 PROPOSED (CCPA-017 project-scale parity +
+      CCPA-018 Arena recovery-rate) — both awaiting first
+      operator-dispatched bench, after which v1.30.0 will flip
+      PROPOSED → ACTIVE_RUNTIME for whichever has converged. Rest at
+      PLANNED_M*/IN_REVIEW/HARD_BLOCKING_M16 per their lifecycle
+      phase. No OPEN residue.
+
+      Gate registry summary post-v1.29.0:
+        FALSIFY-CCPA-001..006  PLANNED_M*    (Phase 1 RECORD scope; M2.3-rescoped)
+        FALSIFY-CCPA-007       IN_REVIEW     (coverage-floor)
+        FALSIFY-CCPA-008       PLANNED_M6    (parity_score_bound)
+        FALSIFY-CCPA-009..012  ACTIVE_ALGORITHM_LEVEL (CI gates from M0)
+        FALSIFY-CCPA-013       ACTIVE_RUNTIME (first_recorded_parity_score, at v1.27.0)
+        FALSIFY-CCPA-014       ACTIVE_RUNTIME (os_event_parity_bound, at v1.25.0)
+        FALSIFY-CCPA-015       ACTIVE_RUNTIME (ccpa_trace_subproc_output_purity, at v1.26.0)
+        FALSIFY-CCPA-016       ACTIVE_RUNTIME (outcome_parity_bound, at v1.26.0)
+        FALSIFY-CCPA-017       PROPOSED       (project_scale_parity_bound, at v1.28.0)
+        FALSIFY-CCPA-018       PROPOSED       (arena_recovery_rate_bound, at v1.29.0)
+
+      Pure additive bump: new gate + new status_history entry. No
+      schema bump in aprender-contracts/src/schema/. pv validate clean.
+
   - date: '2026-05-15'
     from: 'ACTIVE_RUNTIME v1.27.0'
     to: 'ACTIVE_RUNTIME v1.28.0'