Skip to content

ci: defensive hardening (timeouts + bounded log-dump + dispatch macOS-only)#8

Merged
andylbrummer merged 1 commit into
mainfrom
ci-hardening
Jun 17, 2026
Merged

ci: defensive hardening (timeouts + bounded log-dump + dispatch macOS-only)#8
andylbrummer merged 1 commit into
mainfrom
ci-hardening

Conversation

@andylbrummer

Copy link
Copy Markdown
Member

Tier-0 defensive CI from the runner-stall incidents this session. All config, no external watchdog.

Changes

  • timeout-minutes on every build job (linux 60, windows 75, macos 70, benchmarks 60; release.yml mirrors). A hung step can't occupy a self-hosted runner forever — GitHub kills it, runner frees. Prevents the 2h+ wedge.
  • Bounded, non-blocking log-dumpDump vcpkg logs on failure gets timeout-minutes: 3 + continue-on-error: true. That step is exactly what hung 2h; a failure handler must never block the job.
  • Dispatch runs macOS-onlylinux/windows gain if: github.event_name != 'workflow_dispatch'. Dispatch exists only to validate the billed macOS leg; it previously also ran redundant Linux/Windows legs that starved the runners (a cause of today's queueing).

Why config over a watchdog service

These prevent every stall hit this session with ~20 lines and zero ongoing cost / no host access. A host-side systemd watchdog is only needed for the rare true process-wedge (deferred).

Validated: both workflows parse; per-job timeout + guard confirmed.

🤖 Generated with Claude Code

…cOS-only

Tier-0 fixes for the self-hosted-runner stalls hit this session, all config, no
external watchdog needed:

- timeout-minutes on every build job (linux 60, windows 75, macos 70,
  benchmarks 60; release.yml mirrors). A hung step can no longer occupy a
  self-hosted runner indefinitely — GitHub kills the job and frees the runner.
  Directly prevents the 2h+ wedge a hung vcpkg log-dump caused.
- The "Dump vcpkg logs on failure" steps get timeout-minutes: 3 +
  continue-on-error: true. That diagnostic step is exactly what hung for 2h; a
  failure handler must never outlive its purpose or block the job.
- workflow_dispatch now runs the macOS leg ONLY: linux + windows gain
  `if: github.event_name != 'workflow_dispatch'`. Dispatch exists solely to
  validate the billed macOS leg on a branch; previously it also ran redundant
  Linux/Windows legs that competed for the runners (a cause of today's queue
  starvation). macos keeps its push|dispatch guard; benchmarks is push+main.

Workflows validated (yaml parse + per-job timeout/guard check).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@andylbrummer andylbrummer merged commit 2482d15 into main Jun 17, 2026
14 of 16 checks passed
@andylbrummer andylbrummer deleted the ci-hardening branch June 17, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant