Skip to content

ci: re-enable sccache — unblocks 240-PR cold-compile bottleneck#1632

Merged
noahgift merged 3 commits into
mainfrom
fix/ci-enable-sccache
May 12, 2026
Merged

ci: re-enable sccache — unblocks 240-PR cold-compile bottleneck#1632
noahgift merged 3 commits into
mainfrom
fix/ci-enable-sccache

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

  • Flip `enable_sccache: false` → `true` in workspace-test workflow input.
  • Wrapper script (`/usr/local/bin/rustc-sccache`) was fixed upstream in paiml/infra commit `f4fccf9` on 2026-04-19 but aprender's ci.yml was never re-enabled.
  • Re-enabling sccache unblocks the 240-PR cold-compile bottleneck (each PR currently takes 34min to compile + 4min for tests inside a 40min timeout → workspace-test fails on every PR).

Diagnosis

```
$ docker run --rm localhost:5000/sovereign-ci:stable rustc-sccache --version
sccache 0.14.0
$ docker run --rm localhost:5000/sovereign-ci:stable which rustc-sccache
/usr/local/bin/rustc-sccache
```

Wrapper is in the live image. Cache dir warm:

```
$ sudo du -sh /home/noah/data/sccache
11G /home/noah/data/sccache
```

Bind-mount and `RUSTC_WRAPPER=rustc-sccache` env var are already wired in `paiml/.github/.github/workflows/sovereign-ci.yml:210`, gated on the `enable_sccache` input. Only aprender's flag was stale.

Job timing — current state (PR #1619, last failed run)

Phase Duration
Docker pull + setup 1m
cargo build (879 crates) 34m
cargo test (25,300 lib tests) 4m before timeout
Total 40m (timeout, FAIL)

Expected after this PR

With ~80% sccache hit rate on warm cache:

  • Build: 34m → ~3-5m
  • Tests: get full ~35m budget instead of 4m
  • Merge queue drains; the 240-PR backlog unblocks

Test plan

🤖 Generated with Claude Code

The Phase 3 sccache pilot was disabled on 2026-04-19 because the
`sovereign-ci:stable` container image was missing the `rustc-sccache`
wrapper script. That was fixed upstream in paiml/infra commit f4fccf9
("use exec script not symlink", PR #66) the same day, but aprender's
ci.yml was never flipped back.

Verified on intel runner:

    $ docker run --rm localhost:5000/sovereign-ci:stable rustc-sccache --version
    sccache 0.14.0
    $ docker run --rm localhost:5000/sovereign-ci:stable which rustc-sccache
    /usr/local/bin/rustc-sccache

Sccache cache directory is warm: `/home/noah/data/sccache` is ~11GB
across 290 sub-dirs, shared across all 16 intel-clean-room runners and
all PRs via the existing `/home/noah/data/sccache:/sccache` bind-mount
in `paiml/.github/.github/workflows/sovereign-ci.yml`.

Why this matters:

- Per-PR target dir scheme (`/mnt/nvme-raid0/targets/aprender-ci/<PR>`)
  from #1043 cold-compiles each new PR's 879 deps from scratch.
- Job timing (PR #1619 latest run): 34min build + 4min tests = 40min
  timeout. Tests never finish.
- 249-PR queue × 34min cold compile = backlog cannot drain.
- With sccache hit-rate ≥80% expected on a warm cache, cold builds
  drop from 34min → ~3-5min, and the timeout becomes a non-issue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 12, 2026 06:32
noahgift and others added 2 commits May 12, 2026 08:34
The first commit on this branch flipped enable_sccache=true on the
reusable ci/{test,lint,coverage,...} jobs. That doesn't reach the
inline `workspace-test` job (the slowest one, where the 40min timeout
actually fires), so this commit wires sccache into it directly:

- Bind-mount /home/noah/data/sccache:/sccache (shared across all 16
  intel-clean-room runners + all PRs; sccache handles concurrency
  via per-entry atomic rename + LRU eviction).
- Set RUSTC_WRAPPER=rustc-sccache (image-baked exec shim) and
  SCCACHE_DIR=/sccache env vars.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 9c51558 into main May 12, 2026
10 checks passed
@noahgift noahgift deleted the fix/ci-enable-sccache branch May 12, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant