Skip to content

Commit 8d8e426

Browse files
committed
Update ARC runner config: minRunners=40 to keep all runners online
Previously minRunners was 0 (scale-to-zero), meaning runners only appeared on the GitHub runners tab when jobs were queued. Set to 40 so all 5 nodes × 8 GPUs stay online and ready.
1 parent 2971ccd commit 8d8e426

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

.claude/skills/arc-gpu-runners.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@ githubConfigUrl: "https://github.com/gpu-mode/kernelbot"
4949
githubConfigSecret:
5050
github_token: "<YOUR_GITHUB_PAT>"
5151

52-
maxRunners: 8
53-
minRunners: 0
52+
maxRunners: 40
53+
minRunners: 40
5454

5555
template:
5656
spec:
@@ -108,7 +108,7 @@ sudo k3s kubectl logs -n arc-systems -l actions.github.com/scale-set-name=arc-ru
108108
- **GPU isolation**: The AMD device plugin exposes `amd.com/gpu` as a k8s resource. Each runner pod requests exactly 1 GPU. Kubernetes guarantees no two pods share a GPU — each gets a unique `/dev/dri/renderD*` device.
109109
- **CPU isolation**: Each pod gets 14 dedicated cores via cgroup limits (`nproc` reports 14 inside the container).
110110
- **RAM isolation**: Each pod gets a 340Gi memory limit enforced by cgroups. Exceeding it triggers OOM kill.
111-
- **Autoscaling**: With `minRunners: 0` and `maxRunners: 40`, runners spin up on demand when GitHub queues jobs and are destroyed after completion (ephemeral runners). The scheduler spreads pods across all 5 nodes.
111+
- **Autoscaling**: With `minRunners: 40` and `maxRunners: 40`, all 40 runners stay online and idle on the GitHub runners tab, ready to pick up jobs instantly. The scheduler spreads pods across all 5 nodes (8 per node). Note: `minRunners: 0` means runners only exist when there are queued jobs and won't appear on the GitHub runners tab when idle.
112112

113113
## Resource Budget (per MI355X node)
114114

0 commit comments

Comments
 (0)