feat: standalone criterion bench workflow + self-hosted intel runner#12
Merged
Conversation
Three additions:
1. .github/workflows/bench.yml — runs the full criterion suite
(warmup + 100 samples × 3 sizes) on a self-hosted [intel] runner
from the paiml org runner pool. Triggered via workflow_dispatch,
weekly cron (Sundays 06:00 UTC), or push to main when etl-core/
etl-bench/ Cargo.{toml,lock} change. Pins CPU governor to
performance during the run for low-CV measurements, captures
host metadata, and commits curated results back to main.
2. bench-results/ — committed criterion output as a teaching
artifact. Seed numbers from a Threadripper 7960X local run
(~9.6M rows/sec across all three sizes — pipeline is linear in
input size) plus a README explaining what's tracked, why
benchmark output belongs in the repo, and how to read the JSON
estimates. Workflow runs overwrite latest/; git history
preserves drift.
3. README throughput badge + Benchmarks section — links to
bench-results/latest/SUMMARY.md and explains the smoke-vs-full
split between gate (CI) and bench (this) workflows. Covers why
we picked a self-hosted runner (CV <1% bare metal vs 5-15%
GitHub-hosted, so a 2% regression is real signal for a
teaching repo).
CHANGELOG entries added under [Unreleased] / Added.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.github/workflows/bench.yml— full criterion suite on a self-hosted[intel]org runner (paiml/intel-clean-room-{1..8}), triggered manually, weekly via cron, or onmainpushes that touchetl-core//etl-bench//Cargo.{toml,lock}.bench-results/latest/(Threadripper 7960X, ~9.6M rows/sec across 1k/10k/100k sizes — confirms linear scaling).gate(CI) andbench(this workflow).[Unreleased].Why a self-hosted runner?
Criterion's statistical model assumes low-variance measurements. GitHub-hosted shared VMs run with 5-15% CV, hiding regressions below ~10%. Bare-metal stays under 1%, so a 2% regression is real signal. For a teaching repo, signal quality is worth the operational cost.
Why commit benchmark output?
etl_throughput/100000is a real signal.Test plan
gate(existing CI) stays greenbench.ymlfrom the Actions tab to confirm runner pickup + result commit-backbench-results/latest/SUMMARY.md🤖 Generated with Claude Code