From 355a1e74df0f9213f32ea83b5bb8ccc581f0fcca Mon Sep 17 00:00:00 2001 From: Noah Gift Date: Fri, 15 May 2026 08:14:49 +0200 Subject: [PATCH 1/5] =?UTF-8?q?contracts(ccpa):=20v1.27.0=20=E2=86=92=20v1?= =?UTF-8?q?.28.0=20=E2=80=94=20register=20FALSIFY-CCPA-017=20project=5Fsca?= =?UTF-8?q?le=5Fparity=5Fbound=20(PROPOSED)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds 1 new falsification gate to the registry: CCPA-017 (project-scale parity bound). Gate count: 16 → 17. Phase 4 closes the function-scale → project-scale extrapolation gap the M159 ProgramBench prior-art (arXiv:2605.03546) flagged. 0%/200 fully-resolved across Claude Opus/Sonnet/Haiku + GPT + Gemini at project-scale means a CCPA-016-style "both pass" assertion is implausible. CCPA-017 inverts the question: partial-progress agreement, not all-or-nothing. DUAL-threshold design: - partial_agreement >= 0.3 - files_jaccard_corpus >= 0.3 Both orthogonal channels must show agreement. Tentative 0.3/0.3 POC-tier floors; recalibration awaits first operator-dispatched measurement. CCPA-017 enters at status: PROPOSED because no operator-dispatched measurement has produced evidence/phase-4/project-scale-scores.json yet. The live-evidence test is #[ignore]'d until that file exists. Once the operator runs `bash scripts/phase-4-bench.sh` and the gate passes against real data, a v1.29.0 bump will flip PROPOSED → ACTIVE_RUNTIME. Companion-repo Phase 4 sequence: - M180 (PR #167 squash c7107b9) — phase-4-project-scale-plan.md - M182 (PR #169 squash b36ceb6) — P4.1 corpus (5 real GitHub issues) - M184 (PR #171 squash 0f8c451) — P4.2 runner (phase-4-bench.sh) - M186 (PR #173 squash c115966) — P4.3 scoring (project_scale_diff.rs) - M188 (PR #175 squash a574655) — P4.4 gate test scaffold Gate-level statuses post-v1.28.0: 4 ACTIVE_RUNTIME (CCPA-013/014/015/ 016) + 1 PROPOSED (CCPA-017) + rest at PLANNED_M*/IN_REVIEW per their lifecycle phase. No OPEN residue. Pure additive bump: new gate + new status_history entry. No schema bump in aprender-contracts/src/schema/. pv validate clean. Co-Authored-By: Claude Opus 4.7 --- contracts/claude-code-parity-apr-v1.yaml | 203 ++++++++++++++++++++++- 1 file changed, 201 insertions(+), 2 deletions(-) diff --git a/contracts/claude-code-parity-apr-v1.yaml b/contracts/claude-code-parity-apr-v1.yaml index 80a7522ce..4ff4d0765 100644 --- a/contracts/claude-code-parity-apr-v1.yaml +++ b/contracts/claude-code-parity-apr-v1.yaml @@ -63,8 +63,8 @@ metadata: - crates/aprender-orchestrate/contracts/batuta/apr-code-v1.yaml name: claude-code-parity-apr -version: "1.27.0" -status: ACTIVE_RUNTIME # 16/16 gates registered; 4 with status: ACTIVE_RUNTIME (CCPA-013/014/015/016 — the runtime-evidence + outcome-parity track), rest at PLANNED_M*/IN_REVIEW/HARD_BLOCKING_M16 per their lifecycle phase. No OPEN residue. v1.27.0 (companion-repo M167, 2026-05-14) — flips FALSIFY-CCPA-013 (first_recorded_parity_score) from `status: OPEN` → `status: ACTIVE_RUNTIME`. The gate's assertion has been satisfied since v1.1.0 (3 measured_parity blocks dating 2026-04-27 against `fixtures/canonical/` with aggregate_score = 1.0000), but the gate-level status field was never flipped — stale prose that this revision corrects. Also extends the assertion's `fixture_corpus_path` constraint to accept EITHER `fixtures/canonical/` (AUTHORED, since v1.2.0) OR `evidence/phase-3/captures/` (REAL-BINARY bilateral bench, companion-repo M150 — claude 2.1.139 + apr 0.32.0 + Qwen2.5-Coder-1.5B-Instruct-Q4_K_M, agreement = 1.0000 on MultiPL-E-Rust HumanEval/0..4). Adds a 4th measured_parity block under CCPA-013 recording M150's real-binary evidence as the strongest empirical discharge anchor. **CCPA-013 was the last gate stuck at `status: OPEN`** — its flip closes the OPEN residue. v1.26.0 (companion-repo M147+M152+M162 Phase 3 sequence, 2026-05-13) (companion-repo M147+M152+M162 Phase 3 sequence, 2026-05-13) — adds FALSIFY-CCPA-015 (ccpa_trace_subproc_output_purity) AND FALSIFY-CCPA-016 (outcome_parity_bound) to the gate registry. CCPA-015 was authored at M147 via provable-contract design (falsifying test FIRST, fix via Stdio::null()) for the ccpa-trace-subproc capture binary; PROPOSED in v1.25.0, promoted ACTIVE_RUNTIME here. CCPA-016 is the Phase 3 P3.4 outcome-parity gate authored at M152 — asserts aggregate agreement >= 0.5 on a MultiPL-E-Rust-class corpus with bidirectional sensitivity (synthetic regression fixture fails threshold; synthetic identity passes). CCPA-016 was empirically validated at M150 (real bilateral bench produced agreement = 1.0000 on 5/5 HumanEval/0..4 with real claude 2.1.139 + real apr code 0.32.0 via Qwen2.5-Coder-1.5B-Instruct-Q4_K_M). The companion-repo M162 row records that aprender#1638 MERGED upstream at squash b61b76b4 (2026-05-13), un-gating apr code from `--features code` so `cargo install apr-cli` ships it by default — the Axis 3 LlmDriver-adapter discharge is FULLY confirmed. v1.25.0 (companion-repo M136-M140 axis-2-closure-plan sequence, 2026-05-11) — adds FALSIFY-CCPA-014 (companion-repo M136-M140 axis-2-closure-plan sequence, 2026-05-11) — adds FALSIFY-CCPA-014 (os_event_parity_bound) to the gate registry, completing the axis-2 closure-plan idea (2) CLI subprocess instrumentation track. New gate consumes ccpa_subproc::OsEvent records (M136) via ccpa_differ::os_event_parity (M137) and asserts canonical-corpus score >= 0.95 + bidirectional sensitivity on regression corpus (M139). v1.24.0 (companion-repo M128-M131 sequence, 2026-05-10) — bumped from v1.23.0 to integrate the M109 cosine-vs-HF-FP16 LIVE-DISCHARGE (cos_sim 0.995384 ≥ 0.99 on lambda-vector RTX 4090, 2026-05-09; aprender PR #1597 squash 3fb04ef86 flipped `qwen3-moe-forward-v1` v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME). Discharges the v1.23.0 status-prose claim "Cosine vs HF FP16 remains operator-confirm pending ~60 GB HF download" — the FP16 weights had been on lambda-vector at /mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct/ (57 GB / 16 safetensors shards) for ~7 days; the "60 GB download" blocker was stale by 62 days. v1.23.0 (M35 M32d discharge audit-trail bump) records the 4-bug stack landed on aprender main as commit 5235aaeb9 (#1228) plus diagnostic surface PRs #1222 (Step 2), #1226 (Step 2.5), #1401 (Step 2 JSON wire). M32d gibberish output ("%%%%%%%%") converted to coherent English answers across math/geography/translation/code domains. M34 FAST PATH 5-whys plan delivered at lucky-case bound (5 substantive PRs vs 4-6 estimated, ~6 hours wall vs 2-3 days). Component priors verified empirically: rank-3 Q/K RMSNorm (15%) + rank-4 rope_theta (10%) + chat template both correct. Cosine vs HF FP16 formal flip **DISCHARGED 2026-05-09 at companion-repo M109** (apr_argmax = hf_argmax = 3555 " What"; 555ms apr-forward; HF FP16 fixture generated in 52s). +version: "1.28.0" +status: ACTIVE_RUNTIME # 17/17 gates registered; 4 with status: ACTIVE_RUNTIME (CCPA-013/014/015/016 — the runtime-evidence + outcome-parity track) + 1 with status: PROPOSED (CCPA-017 — project-scale parity, awaiting first operator-dispatched bench to flip ACTIVE_RUNTIME at v1.29.0), rest at PLANNED_M*/IN_REVIEW/HARD_BLOCKING_M16 per their lifecycle phase. No OPEN residue. v1.28.0 (companion-repo M180-M188 Phase 4 sequence, 2026-05-15) — adds FALSIFY-CCPA-017 (project_scale_parity_bound) to the gate registry. Phase 4 operationalizes the M159 ProgramBench prior-art (arXiv:2605.03546, 0%/200 SOTA baseline) into companion-tier project-scale parity testing: the M182 corpus draws 5 fixtures from real open GitHub issues across paiml/decy + paiml/bashrs + paiml/depyler with pinned pre-fix commit SHAs; the M184 runner (scripts/phase-4-bench.sh, 288 lines bash) clones at the pinned SHA, dispatches each system with timeout APR_TIMEOUT_S (default 900s), snapshots diff vs SHA, runs the per-fixture oracle_cmd; the M186 scorer (crates/ccpa-differ/src/project_scale_diff.rs, ~310 lines Rust) lifts the runner JSON into ProjectScaleParityReport with 5 derived metrics (per-fixture: approach_match + lines_edited_ratio; corpus-level: partial_agreement + files_jaccard_corpus + approach_match_rate); the M188 gate test (crates/ccpa-differ/tests/falsify_ccpa_017_project_scale_parity.rs, ~260 lines, 7 active + 1 #[ignore]'d) asserts partial_agreement >= 0.3 AND files_jaccard_corpus >= 0.3 with bidirectional sensitivity verified on synthetic identity (passes) and synthetic regression (fails) fixtures. CCPA-017 enters at status: PROPOSED because no operator-dispatched measurement has produced evidence/phase-4/project-scale-scores.json yet; the live-evidence test is #[ignore]'d until that exists. Threshold values (0.3/0.3) are tentative POC-tier floors — they WILL be recalibrated after first operator dispatch. Phase 4 is the SIGNAL regime, not the SATURATION regime: a CCPA-016-style "agreement = 1.0" result is implausible at project-scale per ProgramBench evidence; the goal is "do both systems make matching partial progress?" not "do both systems fully succeed?". v1.27.0 (companion-repo M167, 2026-05-14) — flips FALSIFY-CCPA-013 (first_recorded_parity_score) from `status: OPEN` → `status: ACTIVE_RUNTIME`. The gate's assertion has been satisfied since v1.1.0 (3 measured_parity blocks dating 2026-04-27 against `fixtures/canonical/` with aggregate_score = 1.0000), but the gate-level status field was never flipped — stale prose that this revision corrects. Also extends the assertion's `fixture_corpus_path` constraint to accept EITHER `fixtures/canonical/` (AUTHORED, since v1.2.0) OR `evidence/phase-3/captures/` (REAL-BINARY bilateral bench, companion-repo M150 — claude 2.1.139 + apr 0.32.0 + Qwen2.5-Coder-1.5B-Instruct-Q4_K_M, agreement = 1.0000 on MultiPL-E-Rust HumanEval/0..4). Adds a 4th measured_parity block under CCPA-013 recording M150's real-binary evidence as the strongest empirical discharge anchor. **CCPA-013 was the last gate stuck at `status: OPEN`** — its flip closes the OPEN residue. v1.26.0 (companion-repo M147+M152+M162 Phase 3 sequence, 2026-05-13) (companion-repo M147+M152+M162 Phase 3 sequence, 2026-05-13) — adds FALSIFY-CCPA-015 (ccpa_trace_subproc_output_purity) AND FALSIFY-CCPA-016 (outcome_parity_bound) to the gate registry. CCPA-015 was authored at M147 via provable-contract design (falsifying test FIRST, fix via Stdio::null()) for the ccpa-trace-subproc capture binary; PROPOSED in v1.25.0, promoted ACTIVE_RUNTIME here. CCPA-016 is the Phase 3 P3.4 outcome-parity gate authored at M152 — asserts aggregate agreement >= 0.5 on a MultiPL-E-Rust-class corpus with bidirectional sensitivity (synthetic regression fixture fails threshold; synthetic identity passes). CCPA-016 was empirically validated at M150 (real bilateral bench produced agreement = 1.0000 on 5/5 HumanEval/0..4 with real claude 2.1.139 + real apr code 0.32.0 via Qwen2.5-Coder-1.5B-Instruct-Q4_K_M). The companion-repo M162 row records that aprender#1638 MERGED upstream at squash b61b76b4 (2026-05-13), un-gating apr code from `--features code` so `cargo install apr-cli` ships it by default — the Axis 3 LlmDriver-adapter discharge is FULLY confirmed. v1.25.0 (companion-repo M136-M140 axis-2-closure-plan sequence, 2026-05-11) — adds FALSIFY-CCPA-014 (companion-repo M136-M140 axis-2-closure-plan sequence, 2026-05-11) — adds FALSIFY-CCPA-014 (os_event_parity_bound) to the gate registry, completing the axis-2 closure-plan idea (2) CLI subprocess instrumentation track. New gate consumes ccpa_subproc::OsEvent records (M136) via ccpa_differ::os_event_parity (M137) and asserts canonical-corpus score >= 0.95 + bidirectional sensitivity on regression corpus (M139). v1.24.0 (companion-repo M128-M131 sequence, 2026-05-10) — bumped from v1.23.0 to integrate the M109 cosine-vs-HF-FP16 LIVE-DISCHARGE (cos_sim 0.995384 ≥ 0.99 on lambda-vector RTX 4090, 2026-05-09; aprender PR #1597 squash 3fb04ef86 flipped `qwen3-moe-forward-v1` v1.4.0 ACTIVE_ALGORITHM_LEVEL → v1.5.0 ACTIVE_RUNTIME). Discharges the v1.23.0 status-prose claim "Cosine vs HF FP16 remains operator-confirm pending ~60 GB HF download" — the FP16 weights had been on lambda-vector at /mnt/nvme-raid0/models/Qwen3-Coder-30B-A3B-Instruct/ (57 GB / 16 safetensors shards) for ~7 days; the "60 GB download" blocker was stale by 62 days. v1.23.0 (M35 M32d discharge audit-trail bump) records the 4-bug stack landed on aprender main as commit 5235aaeb9 (#1228) plus diagnostic surface PRs #1222 (Step 2), #1226 (Step 2.5), #1401 (Step 2 JSON wire). M32d gibberish output ("%%%%%%%%") converted to coherent English answers across math/geography/translation/code domains. M34 FAST PATH 5-whys plan delivered at lucky-case bound (5 substantive PRs vs 4-6 estimated, ~6 hours wall vs 2-3 days). Component priors verified empirically: rank-3 Q/K RMSNorm (15%) + rank-4 rope_theta (10%) + chat template both correct. Cosine vs HF FP16 formal flip **DISCHARGED 2026-05-09 at companion-repo M109** (apr_argmax = hf_argmax = 3555 " What"; 555ms apr-forward; HF FP16 fixture generated in 52s). # ───────────────────────────────────────────────────────────────────────────── # Top-level invariants — the 12 falsifiable gates this contract asserts. @@ -90,6 +90,7 @@ invariants: - { id: FALSIFY-CCPA-014, name: os_event_parity_bound, summary: 'OS-level event parity (axis-2-closure-plan M115.4): macro-averaged Jaccard >= 0.95 per fixture in fixtures/os-canonical/; bidirectional-sensitivity gate on fixtures/os-regression/ (every fixture < 0.95 + non-empty drift records).' } - { id: FALSIFY-CCPA-015, name: ccpa_trace_subproc_output_purity, summary: 'Every line emitted to stdout by ccpa-trace-subproc MUST decode as a ccpa_subproc::OsEvent JSON object. Subprocess stdout MUST NOT interleave with the capture stream (use Stdio::null() not Stdio::inherit()).' } - { id: FALSIFY-CCPA-016, name: outcome_parity_bound, summary: 'Outcome parity (Phase 3 P3.4): aggregate agreement on a MultiPL-E-Rust-class corpus >= 0.5 (POC-tier); bidirectional-sensitivity via synthetic regression (< 0.5 → fail) + synthetic identity (1.0 → pass) fixtures.' } + - { id: FALSIFY-CCPA-017, name: project_scale_parity_bound, summary: 'Project-scale parity (Phase 4 P4.4): aggregate partial_agreement >= 0.3 AND files_jaccard_corpus >= 0.3 on a multi-file Cargo-workspace task corpus drawn from real GitHub issues (companion-repo M182). Bidirectional-sensitivity via synthetic identity (passes) + synthetic regression (fails) fixtures. PROPOSED at v1.28.0; ACTIVE_RUNTIME pending first operator-dispatched measurement.' } scope: > every recorded fixture under /fixtures/, every replay run the @@ -814,6 +815,118 @@ falsification_conditions: - { date: '2026-05-13', version_before: '1.25.0', version_after: '1.26.0', change: 'Added FALSIFY-CCPA-016 to gate registry. Companion-repo M152 ships the gate test against live evidence/phase-3/multipl-e-rust-scores.json; M150 produced the bilateral bench empirical evidence; this revision flips the contract to recognize CCPA-016 as ACTIVE_RUNTIME from authoring.' } + - id: FALSIFY-CCPA-017 + name: project_scale_parity_bound + status: PROPOSED + assertion: | + Project-scale parity (Phase 4 P4.4). On a multi-file Cargo-workspace + task corpus where each task is drawn from a real open GitHub issue + (companion-repo M182: fixtures/project-scale/ initially 5 fixtures + across paiml/decy + paiml/bashrs + paiml/depyler), both teacher + (claude) and student (apr code) are dispatched in a clone of the + pinned pre_fix_commit SHA + given the issue body as their prompt + + their final repo state is scored against the per-fixture + completion oracle_cmd. The aggregate project-scale parity report + MUST satisfy BOTH: + - aggregate `partial_agreement` >= 0.3 + - aggregate `files_jaccard_corpus` >= 0.3 + + Where derived metrics are: + partial_agreement = mean over fixtures of + min(teacher.oracle_pass, student.oracle_pass) + files_jaccard_corpus = mean over fixtures of + |teacher.files_touched ∩ student.files_touched| + / |teacher.files_touched ∪ student.files_touched| + + Plus consistency invariants: + - `corpus_size >= 3` (minimum sample size for statistical meaning) + - `corpus_size == per_fixture.len()` (record-count match) + + Bidirectional sensitivity (mandatory): + - A synthetic regression fixture (one side passes, other fails, + disjoint files-touched lists) MUST fail BOTH thresholds. + - A synthetic identity fixture (both sides pass on same files + with identical files_touched_jaccard = 1.0) MUST pass. + - An empty-corpus report MUST fail (prevents "no-data" from + being claimed as success). + + Source of truth: `evidence/phase-4/project-scale-scores.json` + produced by `scripts/phase-4-bench.sh` on the companion repo. + test_harness: | + `cargo test -p ccpa-differ --test falsify_ccpa_017_project_scale_parity` + runs 7 active assertions + 1 `#[ignore]`'d live-evidence assertion: + - synthetic_identity_corpus_passes_gate + - synthetic_regression_corpus_fails_gate + - empty_corpus_vacuously_fails_threshold + - exactly_at_threshold_passes (verifies >= not >) + - just_below_partial_threshold_fails (single-gate sensitivity) + - just_below_files_threshold_fails (single-gate sensitivity) + - threshold_constants_match_plan (sentinel) + - live_evidence_meets_project_scale_threshold (#[ignore]'d + until operator dispatches `bash scripts/phase-4-bench.sh`) + + All 7 active GREEN on the companion-repo M188 scaffold (synthetic + fixtures constructed in-test, no on-disk corpus dependency). + rationale: | + The M180 Phase 4 plan operationalizes the M159 ProgramBench + prior-art (arXiv:2605.03546) into companion-tier project-scale + parity testing. ProgramBench reports 0%/200 fully-resolved across + Claude Opus/Sonnet/Haiku + GPT + Gemini at the project-scale + layer; this evidence validates the M159 caveat "function-level + 1.0 does not extrapolate to project-scale" and establishes the + Phase 4 SIGNAL regime: the user-facing parity question is + "do both systems make matching partial progress?" not "do both + systems fully succeed?". + + The DUAL-threshold design (partial_agreement >= 0.3 AND + files_jaccard_corpus >= 0.3) is intentional: project-scale parity + has two orthogonal signal channels — pass-rate agreement AND + files-touched overlap. A system could match pass rate without + touching the same files (different solutions to same problem); + or touch the same files without matching pass rate (one fixes + the bug, the other breaks more). Both channels must show + agreement for "project-scale parity" to mean anything. + + Threshold values (0.3/0.3) are tentative POC-tier floors. They + WILL be recalibrated after first operator-dispatched measurement + against the M182 corpus. A 0.5/0.5 threshold à la CCPA-016 would + assume saturation that ProgramBench evidence shows doesn't exist + at project-scale; 0.3 is "at least 30% of fixtures see matching + progress" — a plausible POC-tier floor that the M182 corpus + might actually meet. + + Status PROPOSED (not ACTIVE_RUNTIME) because no + operator-dispatched measurement has produced + evidence/phase-4/project-scale-scores.json yet. The + live-evidence test is `#[ignore]`'d until that file exists. + Once the operator runs `bash scripts/phase-4-bench.sh` and the + gate passes against real data, a v1.29.0 bump will flip + PROPOSED → ACTIVE_RUNTIME. + + Companion-repo Phase 4 sequence (M180-M188): + M180 (PR #167 squash c7107b9) — phase-4-project-scale-plan.md + authored; P4.1-P4.5 sub-deliverables defined. + M182 (PR #169 squash b36ceb6) — P4.1 corpus: 5 fixtures from + paiml/decy#40 + paiml/decy#39 + paiml/bashrs#209 + + paiml/depyler#223 + paiml/depyler#224. Operator directive + "why not use ../decy ../bashrs and ../depy corpus" steered + authoring toward real GitHub issues over synthetic stretch + goals. + M184 (PR #171 squash 0f8c451) — P4.2 runner: phase-4-bench.sh + (288 lines bash); clones at pre_fix_commit SHA + dispatches + + snapshots + runs oracle + emits per-fixture and aggregate + JSON with files_touched_jaccard via jq set-arithmetic. + M186 (PR #173 squash c115966) — P4.3 scoring: project_scale_diff.rs + (~310 lines Rust) consumes runner JSON + adds 5 derived + metrics + passes_threshold predicate; 14 unit tests GREEN. + M188 (PR #175 squash a574655) — P4.4 gate test: + falsify_ccpa_017_project_scale_parity.rs (~260 lines); 7 + synthetic-fixture tests verify bidirectional sensitivity + before any real measurement exists. + semantic_change_log: + - { date: '2026-05-15', version_before: '1.27.0', version_after: '1.28.0', + change: "Added FALSIFY-CCPA-017 to gate registry at status: PROPOSED. Companion-repo M188 ships the gate test scaffold (7 synthetic-fixture tests + 1 #[ignore]'d live-evidence test); thresholds (partial_agreement >= 0.3 AND files_jaccard_corpus >= 0.3) are tentative POC-tier floors awaiting first operator-dispatched measurement to calibrate. Phase 4 P4.5 contract bump." } + - id: FALSIFY-CCPA-008 name: parity_score_bound status: PLANNED_M6 @@ -972,6 +1085,92 @@ milestones: # ───────────────────────────────────────────────────────────────────────────── status_history: + - date: '2026-05-15' + from: 'ACTIVE_RUNTIME v1.27.0' + to: 'ACTIVE_RUNTIME v1.28.0' + note: 'companion-repo M180-M188 Phase 4 sequence — FALSIFY-CCPA-017 (project_scale_parity_bound) added to gate registry at status: PROPOSED; awaits first operator-dispatched bench to flip ACTIVE_RUNTIME' + reason: | + Adds 1 new falsification gate to the registry: CCPA-017 + (project-scale parity bound). Gate count: 16 → 17. + + Phase 4 closes the function-scale → project-scale extrapolation + gap the M159 ProgramBench prior-art (arXiv:2605.03546) flagged: + 0%/200 fully-resolved across Claude Opus/Sonnet/Haiku + GPT + + Gemini at the project-scale layer means a CCPA-016-style "both + pass" assertion is implausible. CCPA-017 inverts the question: + partial-progress agreement, not all-or-nothing. + + DUAL-threshold design: partial_agreement >= 0.3 AND + files_jaccard_corpus >= 0.3 — both orthogonal channels must + show agreement. Tentative 0.3/0.3 floors; recalibration awaits + first operator-dispatched measurement. + + Companion-repo Phase 4 sequence (M180-M188): + + M180 (PR #167 squash c7107b9) — phase-4-project-scale-plan.md + authored. P4.1-P4.5 sub-deliverables defined. Anchored in + ProgramBench prior-art (M159) for the signal-regime framing. + + M182 (PR #169 squash b36ceb6) — P4.1 corpus: 5 fixtures from + real open GitHub issues across paiml/decy + paiml/bashrs + + paiml/depyler (decy#40 fix test assertions; decy#39 fix + clippy; bashrs#209 lint Makefile false-positives; depyler#223 + enforce Oracle constraints; depyler#224 numeric coercion). + Operator directive "why not use ../decy ../bashrs and + ../depy corpus" steered toward real GitHub issues over + synthetic stretch goals — real issues = real stretch goals + = the operator has already triaged them as work worth doing. + Each fixture pins pre_fix_commit SHA in meta.toml; runner + clones at dispatch (no per-fixture starting-state snapshot, + which would be impractical at 685+ Rust files in depyler). + + M184 (PR #171 squash 0f8c451) — P4.2 runner: phase-4-bench.sh + (288 lines bash). Per fixture × system: clone at SHA, + dispatch with timeout, snapshot diff, run oracle, record + exit + pattern match. Emits per-fixture + aggregate JSON + to evidence/phase-4/project-scale-scores.json with + files_touched_jaccard via jq set-arithmetic. + + M186 (PR #173 squash c115966) — P4.3 scoring: + project_scale_diff.rs (~310 lines Rust). Type hierarchy + ProjectScaleParityReport → Vec → + (teacher: SideScore, student: SideScore). Loader + ProjectScaleParityReport::from_json_str() + 5 derived + metrics (per-fixture: approach_match + lines_edited_ratio; + corpus-level: partial_agreement + files_jaccard_corpus + + approach_match_rate). 14 unit tests GREEN. + + M188 (PR #175 squash a574655) — P4.4 gate test: + falsify_ccpa_017_project_scale_parity.rs (~260 lines). + 7 active synthetic-fixture tests + 1 #[ignore]'d + live-evidence test. Bidirectional sensitivity verified: + identity passes (partial=1.0, jaccard=1.0); regression + fails (partial=0.0, jaccard=0.0); empty corpus fails + (by design — prevents "no-data" from claiming success); + exactly-at-threshold passes (>= not > semantics); + just-below-either-threshold fails (single-gate + sensitivity). 7/7 GREEN. + + Gate-level statuses post-v1.28.0: 4 ACTIVE_RUNTIME (CCPA-013/ + 014/015/016) + 1 PROPOSED (CCPA-017) — awaiting first + operator-dispatched bench, after which v1.29.0 will flip + PROPOSED → ACTIVE_RUNTIME. Rest at PLANNED_M*/IN_REVIEW/ + HARD_BLOCKING_M16 per their lifecycle phase. No OPEN residue. + + Gate registry summary post-v1.28.0: + FALSIFY-CCPA-001..006 PLANNED_M* (Phase 1 RECORD scope; M2.3-rescoped) + FALSIFY-CCPA-007 IN_REVIEW (coverage-floor) + FALSIFY-CCPA-008 PLANNED_M6 (parity_score_bound) + FALSIFY-CCPA-009..012 ACTIVE_ALGORITHM_LEVEL (CI gates from M0) + FALSIFY-CCPA-013 ACTIVE_RUNTIME (first_recorded_parity_score, at v1.27.0) + FALSIFY-CCPA-014 ACTIVE_RUNTIME (os_event_parity_bound, at v1.25.0) + FALSIFY-CCPA-015 ACTIVE_RUNTIME (ccpa_trace_subproc_output_purity, at v1.26.0) + FALSIFY-CCPA-016 ACTIVE_RUNTIME (outcome_parity_bound, at v1.26.0) + FALSIFY-CCPA-017 PROPOSED (project_scale_parity_bound, at v1.28.0) + + Pure additive bump: new gate + new status_history entry. No + schema bump in aprender-contracts/src/schema/. pv validate clean. + - date: '2026-05-14' from: 'ACTIVE_RUNTIME v1.26.0' to: 'ACTIVE_RUNTIME v1.27.0' From f651296bfcd050e21599946d099fac015f4fea5b Mon Sep 17 00:00:00 2001 From: Noah Gift Date: Fri, 15 May 2026 10:04:08 +0200 Subject: [PATCH 2/5] ci(workspace-test): recover from cancel-in-progress target-dir damage Five-whys for the recurring workspace-test "no such file or directory: '...rcgu.o'" linker failure that has caused 2+ runs to fail on aprender#1684: 1. Why does workspace-test fail? Linker can't find yoke_derive-*.rcgu.o intermediate compile artifacts. 2. Why are they missing? Cargo was killed mid-compile (no .rcgu.o written yet) but .rmeta files already existed. 3. Why was cargo killed? GitHub Actions concurrency.cancel-in-progress killed the previous run on the same PR when a new commit arrived. 4. Why does this persist? The target dir is per-PR-persistent at /mnt/nvme-raid0/targets/aprender-ci//, so partial-compile state survives across runs. 5. Root cause: cargo's incremental state is not atomic-on-kill. cancel-in-progress + persistent per-PR target dir = inconsistent rmeta/rcgu.o pairs. Fix: wrap `cargo test` with detect-and-recover logic: - On link-failure pattern (rcgu.o file-not-found), `cargo clean` and retry once. - On any other failure, propagate exit code unchanged. Adds latency only on the damage-recovery path; warm-cache happy path is unchanged. Affects only the "Workspace lib tests (25,300+)" step. Other steps (Compute tests, Integration tests, Build.rs check) reuse the cleaned target dir from the recovery path if it fires. Co-Authored-By: Claude Opus 4.7 --- .github/workflows/ci.yml | 47 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 2518edc9c..31399dda7 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -105,8 +105,22 @@ jobs: # = 41min, which JUST overruns 40 — observed on intel-clean-room-3 first run # after the refactor invalidated its sccache by changing workspace lints. # Warm runs are ~3-5min compile per the workflow comments above. + # + # Cancel-damage recovery (five-whys): when concurrency.cancel-in-progress + # cancels the previous run on the same PR mid-compile, cargo's + # incremental state is left inconsistent — .rmeta files exist (claiming + # "I'm built") but the matching .rcgu.o codegen artifacts were never + # written because cargo got SIGKILL'd. The persistent per-PR target + # dir at /mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}/ preserves + # this damage across runs. The next run's linker then fails with: + # clang: error: no such file or directory: '...rcgu.o' + # Fix: wrap the cargo invocation; on link-failure pattern (rcgu.o + # missing), `cargo clean` and retry once. Adds latency only on the + # damage-recovery path; warm-cache happy path unchanged. timeout-minutes: 55 run: | + set +e + LOG=$(mktemp) docker run --rm \ -e CI -e GITHUB_ACTIONS -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_RUN_ID -e GITHUB_EVENT_NAME -e GITHUB_WORKFLOW \ -v "${GITHUB_WORKSPACE}:/workspace" \ @@ -122,7 +136,38 @@ jobs: -e CARGO_PROFILE_TEST_DEBUG=line-tables-only \ -e CARGO_PROFILE_DEV_DEBUG=line-tables-only \ "$IMAGE" \ - cargo test --workspace --lib --exclude aprender-gpu --exclude aprender-cuda-edge --exclude aprender-compute + cargo test --workspace --lib --exclude aprender-gpu --exclude aprender-cuda-edge --exclude aprender-compute 2>&1 | tee "$LOG" + TEST_EXIT=${PIPESTATUS[0]} + set -e + if [ "$TEST_EXIT" -ne 0 ] && grep -qE 'no such file or directory.*\.rcgu\.o' "$LOG"; then + echo "::warning::Linker failure on stale target dir from prior cancel-in-progress run; running cargo clean + retry" + docker run --rm \ + -v "${GITHUB_WORKSPACE}:/workspace" \ + -v "/mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}:/workspace/target" \ + -w /workspace \ + -e CARGO_TARGET_DIR=/workspace/target \ + "$IMAGE" \ + cargo clean + docker run --rm \ + -e CI -e GITHUB_ACTIONS -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_RUN_ID -e GITHUB_EVENT_NAME -e GITHUB_WORKFLOW \ + -v "${GITHUB_WORKSPACE}:/workspace" \ + -v "/mnt/nvme-raid0/cargo-ci/registry/${PR_OR_REF}:/usr/local/cargo/registry" \ + -v "/mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}:/workspace/target" \ + -v "/home/noah/data/sccache:/sccache" \ + -w /workspace \ + -e CARGO_TARGET_DIR=/workspace/target \ + -e RUSTC_WRAPPER=rustc-sccache \ + -e SCCACHE_DIR=/sccache \ + -e CARGO_INCREMENTAL=0 \ + -e CARGO_BUILD_JOBS=8 \ + -e CARGO_PROFILE_TEST_DEBUG=line-tables-only \ + -e CARGO_PROFILE_DEV_DEBUG=line-tables-only \ + "$IMAGE" \ + cargo test --workspace --lib --exclude aprender-gpu --exclude aprender-cuda-edge --exclude aprender-compute + elif [ "$TEST_EXIT" -ne 0 ]; then + echo "Non-recovery failure mode; propagating exit $TEST_EXIT" + exit "$TEST_EXIT" + fi - name: Compute tests (tolerate SIGSEGV at exit — all tests pass but harness crashes on cleanup) run: | docker run --rm \ From 6a165a647ad09a7cf5d0ee6c0c133223eac28fea Mon Sep 17 00:00:00 2001 From: Noah Gift Date: Fri, 15 May 2026 10:59:35 +0200 Subject: [PATCH 3/5] ci(workspace-test): expand cancel-damage recovery regex to 3 patterns The initial workspace-test recovery (commit f651296bf) only matched the rcgu.o linker-failure pattern. Aprender#1684 run 25907945016 exposed two additional cancel-damage patterns the original regex missed: Pattern 2: cc-rs missing build subdirs cargo:warning=Fatal error: can't create /workspace/target/debug/build/zstd-sys-X/out/Y.o: No such file or directory Pattern 3: rustc orphan rmeta error: extern location for libloading does not exist: /workspace/target/debug/deps/liblibloading-X.rmeta Both patterns indicate the same root cause (partial-compile state from a SIGKILL'd previous run) but manifest in different cargo subsystems depending on how far the cancelled build got. Updated regex matches ANY of the three known damage patterns: no such file or directory.*\.rcgu\.o Fatal error: can.t create.*\.o: No such file extern location for .* does not exist.*\.rmeta On match: cargo clean + retry once. Adds latency only on the damage-recovery path. Co-Authored-By: Claude Opus 4.7 --- .github/workflows/ci.yml | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 31399dda7..cba044b62 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -108,14 +108,17 @@ jobs: # # Cancel-damage recovery (five-whys): when concurrency.cancel-in-progress # cancels the previous run on the same PR mid-compile, cargo's - # incremental state is left inconsistent — .rmeta files exist (claiming - # "I'm built") but the matching .rcgu.o codegen artifacts were never - # written because cargo got SIGKILL'd. The persistent per-PR target + # incremental state is left inconsistent. The persistent per-PR target # dir at /mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}/ preserves - # this damage across runs. The next run's linker then fails with: - # clang: error: no such file or directory: '...rcgu.o' - # Fix: wrap the cargo invocation; on link-failure pattern (rcgu.o - # missing), `cargo clean` and retry once. Adds latency only on the + # this damage across runs. Observed damage patterns: + # - Orphan rmeta + missing rcgu.o (link fails): + # clang: error: no such file or directory: '...rcgu.o' + # - Missing build subdirs (cc-rs can't write output files): + # cargo:warning=Fatal error: can't create .../build/.../X.o: No such file or directory + # - Orphan rmeta references (rustc dep-graph lookup fails): + # error: extern location for X does not exist: .../X.rmeta + # Fix: wrap the cargo invocation; on ANY of these damage-indicator + # patterns, `cargo clean` and retry once. Adds latency only on the # damage-recovery path; warm-cache happy path unchanged. timeout-minutes: 55 run: | @@ -139,7 +142,11 @@ jobs: cargo test --workspace --lib --exclude aprender-gpu --exclude aprender-cuda-edge --exclude aprender-compute 2>&1 | tee "$LOG" TEST_EXIT=${PIPESTATUS[0]} set -e - if [ "$TEST_EXIT" -ne 0 ] && grep -qE 'no such file or directory.*\.rcgu\.o' "$LOG"; then + # Match any known cancel-damage pattern. Combined regex covers: + # - linker missing .rcgu.o (orphan rmeta, missing codegen) + # - cc-rs Fatal error can't create .o (missing build subdirs) + # - rustc extern location does not exist .rmeta (orphan dep-graph) + if [ "$TEST_EXIT" -ne 0 ] && grep -qE 'no such file or directory.*\.rcgu\.o|Fatal error: can.t create.*\.o: No such file|extern location for .* does not exist.*\.rmeta' "$LOG"; then echo "::warning::Linker failure on stale target dir from prior cancel-in-progress run; running cargo clean + retry" docker run --rm \ -v "${GITHUB_WORKSPACE}:/workspace" \ From 6fcd975b59fe0da2c1510bf9e0a585895d7cd34d Mon Sep 17 00:00:00 2001 From: Noah Gift Date: Fri, 15 May 2026 13:02:07 +0200 Subject: [PATCH 4/5] ci(workspace-test): OS-level rm -rf + sccache-disabled retry on cancel-damage cargo clean (commits f651296bf + df1c2c767) is insufficient. Observed in aprender#1684 runs 25907945016 + 25912792473: after recovery cargo clean, the retry fails within ~52s with: error: couldn't create a temp dir: No such file or directory (os error 2) at path "/workspace/target/debug/deps/rmetaDAW518" Root cause: sccache (mounted at /home/noah/data/sccache, shared across all PRs) caches metadata that references compile artifact paths the cargo clean just removed. The retry asks sccache for a cache hit; sccache returns a metadata blob pointing at a path that no longer exists; rustc's mkdir+write fails. Two-prong fix: 1. OS-level rm -rf of target dir contents (more thorough than cargo clean; doesn't depend on cargo lockfiles being sane), plus pre-create target/debug/deps/ to avoid mkdir races on parallel rustc invocations. 2. Disable sccache during retry (no RUSTC_WRAPPER, no SCCACHE_DIR, no /sccache mount). Pays 30-40min cold-compile cost once to guarantee correctness over a fast-but-broken cached state. Cost: only on the damage-recovery path. Warm-cache happy path unchanged. Co-Authored-By: Claude Opus 4.7 --- .github/workflows/ci.yml | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index cba044b62..e5aa98201 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -147,24 +147,31 @@ jobs: # - cc-rs Fatal error can't create .o (missing build subdirs) # - rustc extern location does not exist .rmeta (orphan dep-graph) if [ "$TEST_EXIT" -ne 0 ] && grep -qE 'no such file or directory.*\.rcgu\.o|Fatal error: can.t create.*\.o: No such file|extern location for .* does not exist.*\.rmeta' "$LOG"; then - echo "::warning::Linker failure on stale target dir from prior cancel-in-progress run; running cargo clean + retry" + # Observed in aprender#1684 runs 25907945016 + 25912792473: cargo + # clean is insufficient — after the clean, the retry build still + # fails with "couldn't create a temp dir: No such file or + # directory at .../target/debug/deps/rmetaX" within ~52s. Root + # cause is sccache cache pointing at cached metadata referencing + # paths the clean just removed. Two-prong fix: + # 1. Use rm -rf at the OS level (more thorough than cargo + # clean; doesn't depend on cargo lockfiles to be in a sane + # state). + # 2. Disable sccache for the retry (no RUSTC_WRAPPER, no + # sccache mount) — pay the 30-40min cold-compile cost + # once to guarantee correctness over a fast-but-broken + # cached state. + echo "::warning::Cancel-damage pattern detected; OS-level target dir nuke + retry without sccache" docker run --rm \ - -v "${GITHUB_WORKSPACE}:/workspace" \ -v "/mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}:/workspace/target" \ - -w /workspace \ - -e CARGO_TARGET_DIR=/workspace/target \ "$IMAGE" \ - cargo clean + bash -c 'rm -rf /workspace/target/* /workspace/target/.[!.]* 2>/dev/null || true; mkdir -p /workspace/target/debug/deps; ls -la /workspace/target/' docker run --rm \ -e CI -e GITHUB_ACTIONS -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_RUN_ID -e GITHUB_EVENT_NAME -e GITHUB_WORKFLOW \ -v "${GITHUB_WORKSPACE}:/workspace" \ -v "/mnt/nvme-raid0/cargo-ci/registry/${PR_OR_REF}:/usr/local/cargo/registry" \ -v "/mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}:/workspace/target" \ - -v "/home/noah/data/sccache:/sccache" \ -w /workspace \ -e CARGO_TARGET_DIR=/workspace/target \ - -e RUSTC_WRAPPER=rustc-sccache \ - -e SCCACHE_DIR=/sccache \ -e CARGO_INCREMENTAL=0 \ -e CARGO_BUILD_JOBS=8 \ -e CARGO_PROFILE_TEST_DEBUG=line-tables-only \ From d34013918c1b80d36f467af820c7c72a32d5949e Mon Sep 17 00:00:00 2001 From: Noah Gift Date: Fri, 15 May 2026 13:56:39 +0200 Subject: [PATCH 5/5] ci(workspace-test): prevent cancel-damage via pre-flight check (not retry-on-failure) Replaces the retry-on-failure recovery (commits 6a165a647 + 6fcd975b5) with a prevention-based pre-flight check that runs ONCE at the start of workspace-test. Five-whys (root cause): 1. Why does workspace-test fail with "no such file .rcgu.o" / extern location missing / cc-rs can't create .o? Cargo's incremental state on the per-PR target dir is inconsistent. 2. Why inconsistent? Prior run was SIGKILL'd mid-compile. 3. Why SIGKILL'd? concurrency.cancel-in-progress (line 22) cancels the previous run when a new commit lands (every "Update branch" / strict-up-to-date trigger when main moves). 4. Why does this persist? Target dir is bind-mounted from per-PR persistent path; partial-compile state survives across runs. 5. Root cause: cargo's incremental state is not atomic-on-kill, so persistent shared target dir + cancel-in-progress = damage. Prevention: at job start, check the previous workflow run on this branch via gh api. If conclusion was "cancelled", rm -rf the target dir BEFORE invoking cargo. This addresses the root cause by ensuring no damaged state enters the cargo invocation in the first place. This is NOT retry-on-failure (which the operator rejects under the "flake is not allowed" directive). It is one-time prevention based on a verifiable signal (prior run conclusion = cancelled). Cost: ~30-40min cold rebuild ONLY on runs following a cancellation. Warm-cache happy path (no prior cancellation): zero added latency. Co-Authored-By: Claude Opus 4.7 --- .github/workflows/ci.yml | 108 +++++++++++++++++---------------------- 1 file changed, 48 insertions(+), 60 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e5aa98201..133cd76fa 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -99,31 +99,61 @@ jobs: sleep "$delay" delay=$((delay + 6)) # linear backoff: 4,10,16,22,28,34,... done + - name: Pre-flight target-dir consistency check + # Root cause (five-whys): + # 1. Why does workspace-test sometimes fail with "no such file or + # directory .rcgu.o" / extern location missing / cc-rs can't + # create .o? Cargo's incremental state on the per-PR target + # dir is inconsistent. + # 2. Why inconsistent? A prior run was SIGKILL'd mid-compile and + # left orphan .rmeta files (parts cargo had registered as built) + # without the corresponding .rcgu.o codegen artifacts (which were + # mid-write at the moment of the kill). + # 3. Why was it SIGKILL'd? concurrency.cancel-in-progress (line 22) + # cancels the previous run as soon as a new commit lands on the + # branch (and "Update branch" / strict-up-to-date triggers this + # every time aprender main moves forward). + # 4. Why does this persist? The target dir is bind-mounted from a + # per-PR persistent path /mnt/nvme-raid0/targets/aprender-ci//, + # so partial-compile state survives across runs. + # 5. Root cause: cargo's incremental state is not atomic-on-kill, so + # a persistent shared target dir + cancel-in-progress = damage. + # Prevention (this step): BEFORE invoking cargo, check whether the + # immediately-preceding workflow run on this branch was cancelled. If + # yes, rm -rf the target dir contents. This is a one-time check at + # job start — NOT a retry-on-failure pattern (which the operator + # rejects under the "flake is not allowed" directive). + run: | + set -e + if [ -z "${{ github.event.pull_request.number }}" ]; then + echo "Not a PR run; skipping prior-cancel check" + exit 0 + fi + # Find the immediately-preceding workflow run on this branch. + # status=completed filter excludes the current in-progress run. + PREV_CONCLUSION=$(gh api \ + "repos/${GITHUB_REPOSITORY}/actions/runs?branch=${GITHUB_HEAD_REF}&status=completed&per_page=1" \ + --jq '.workflow_runs[0].conclusion' 2>/dev/null || echo "") + echo "Previous run conclusion on ${GITHUB_HEAD_REF}: ${PREV_CONCLUSION:-}" + if [ "$PREV_CONCLUSION" = "cancelled" ]; then + echo "::warning::Previous run was cancelled; nuking target dir to prevent cargo cancel-damage" + docker run --rm \ + -v "/mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}:/workspace/target" \ + "$IMAGE" \ + bash -c 'rm -rf /workspace/target/* /workspace/target/.[!.]* 2>/dev/null || true; ls -la /workspace/target/ || true' + else + echo "No cancel damage to clean (prior conclusion: ${PREV_CONCLUSION:-fresh-branch})" + fi + env: + GH_TOKEN: ${{ github.token }} - name: Workspace lib tests (25,300+) # Excluded: aprender-gpu (cuBLAS), aprender-cuda-edge (CUDA), aprender-compute (SIMD SIGSEGV at exit) # Timeout: 55min (was 40). Cold sccache run is ~34min compile + ~7min tests # = 41min, which JUST overruns 40 — observed on intel-clean-room-3 first run # after the refactor invalidated its sccache by changing workspace lints. # Warm runs are ~3-5min compile per the workflow comments above. - # - # Cancel-damage recovery (five-whys): when concurrency.cancel-in-progress - # cancels the previous run on the same PR mid-compile, cargo's - # incremental state is left inconsistent. The persistent per-PR target - # dir at /mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}/ preserves - # this damage across runs. Observed damage patterns: - # - Orphan rmeta + missing rcgu.o (link fails): - # clang: error: no such file or directory: '...rcgu.o' - # - Missing build subdirs (cc-rs can't write output files): - # cargo:warning=Fatal error: can't create .../build/.../X.o: No such file or directory - # - Orphan rmeta references (rustc dep-graph lookup fails): - # error: extern location for X does not exist: .../X.rmeta - # Fix: wrap the cargo invocation; on ANY of these damage-indicator - # patterns, `cargo clean` and retry once. Adds latency only on the - # damage-recovery path; warm-cache happy path unchanged. timeout-minutes: 55 run: | - set +e - LOG=$(mktemp) docker run --rm \ -e CI -e GITHUB_ACTIONS -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_RUN_ID -e GITHUB_EVENT_NAME -e GITHUB_WORKFLOW \ -v "${GITHUB_WORKSPACE}:/workspace" \ @@ -139,49 +169,7 @@ jobs: -e CARGO_PROFILE_TEST_DEBUG=line-tables-only \ -e CARGO_PROFILE_DEV_DEBUG=line-tables-only \ "$IMAGE" \ - cargo test --workspace --lib --exclude aprender-gpu --exclude aprender-cuda-edge --exclude aprender-compute 2>&1 | tee "$LOG" - TEST_EXIT=${PIPESTATUS[0]} - set -e - # Match any known cancel-damage pattern. Combined regex covers: - # - linker missing .rcgu.o (orphan rmeta, missing codegen) - # - cc-rs Fatal error can't create .o (missing build subdirs) - # - rustc extern location does not exist .rmeta (orphan dep-graph) - if [ "$TEST_EXIT" -ne 0 ] && grep -qE 'no such file or directory.*\.rcgu\.o|Fatal error: can.t create.*\.o: No such file|extern location for .* does not exist.*\.rmeta' "$LOG"; then - # Observed in aprender#1684 runs 25907945016 + 25912792473: cargo - # clean is insufficient — after the clean, the retry build still - # fails with "couldn't create a temp dir: No such file or - # directory at .../target/debug/deps/rmetaX" within ~52s. Root - # cause is sccache cache pointing at cached metadata referencing - # paths the clean just removed. Two-prong fix: - # 1. Use rm -rf at the OS level (more thorough than cargo - # clean; doesn't depend on cargo lockfiles to be in a sane - # state). - # 2. Disable sccache for the retry (no RUSTC_WRAPPER, no - # sccache mount) — pay the 30-40min cold-compile cost - # once to guarantee correctness over a fast-but-broken - # cached state. - echo "::warning::Cancel-damage pattern detected; OS-level target dir nuke + retry without sccache" - docker run --rm \ - -v "/mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}:/workspace/target" \ - "$IMAGE" \ - bash -c 'rm -rf /workspace/target/* /workspace/target/.[!.]* 2>/dev/null || true; mkdir -p /workspace/target/debug/deps; ls -la /workspace/target/' - docker run --rm \ - -e CI -e GITHUB_ACTIONS -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_RUN_ID -e GITHUB_EVENT_NAME -e GITHUB_WORKFLOW \ - -v "${GITHUB_WORKSPACE}:/workspace" \ - -v "/mnt/nvme-raid0/cargo-ci/registry/${PR_OR_REF}:/usr/local/cargo/registry" \ - -v "/mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}:/workspace/target" \ - -w /workspace \ - -e CARGO_TARGET_DIR=/workspace/target \ - -e CARGO_INCREMENTAL=0 \ - -e CARGO_BUILD_JOBS=8 \ - -e CARGO_PROFILE_TEST_DEBUG=line-tables-only \ - -e CARGO_PROFILE_DEV_DEBUG=line-tables-only \ - "$IMAGE" \ - cargo test --workspace --lib --exclude aprender-gpu --exclude aprender-cuda-edge --exclude aprender-compute - elif [ "$TEST_EXIT" -ne 0 ]; then - echo "Non-recovery failure mode; propagating exit $TEST_EXIT" - exit "$TEST_EXIT" - fi + cargo test --workspace --lib --exclude aprender-gpu --exclude aprender-cuda-edge --exclude aprender-compute - name: Compute tests (tolerate SIGSEGV at exit — all tests pass but harness crashes on cleanup) run: | docker run --rm \