rework ci by likewhatevs · Pull Request #3330 · sched-ext/scx

likewhatevs · 2026-02-16T14:49:37Z

ci grew in complexity over time. this commit simplifies it while adding some capabilities we have wanted (proper veristat support, ai review, automated performance testing and analysis) in the process.

I made a couple of tools we've needed for a while now in the process of doing this:

I figure once i know how folks feel about them, i'll publish on gh marketplace (for the action) and crates.io (for the cargo plugin).

Anyway, on the topic of the points of this refactor

simple

install stuff with apt, cache with gh action cache where needed, minimal scripts (only 1 i think, for the benchmark stuff), pretty much everything is in github action yml, and there are only 2 of those files.

veristat

TL;DR -- put rodata dumps in a directory named veristat in your scheduler's directory and ci will run X kernel's through verification using all Y dumps present. layered's inst count goes from 20k to 200k w/ this w/ --run-example.

longer story

about a year ago i tested a layered binary in a vm on a desktop i got just to do that and learned then that vm-based testing is well, not the whole story when it comes to verifying bpf programs.

In that particular instance, the vm's default config was something that layered detected as "smt disabled" and the "smt enabled" (i.e. the common-if-not-always) case always failed to verify.

That cargo veristat plugin is for this case, it contains the glue logic necessary to feed rodata dumps from bpftool into veristat.

This paired with vm based testing (to change the verifier being ran) enables what in my head has been a near gold-standard for knowing if something will verify. I'm sure we'll find cases where this will not work, but I think it's a step in the right direction.

terse/categorized output

on ci job logs page, we now get things like this:

and this:
https://github.com/likewhatevs/scx/actions/runs/22052475511/attempts/1#summary-63713324076 (cargo veristat does some loop detection so infinite logs are made small enough to render in UIs and be work-with-able in logs).

AI

we haz ai: likewhatevs#13 (comment)

benchmarks

when scheduler code (not dependency code) is edited, a script will run a handful of benchmarks and rsched against the modified scheduler. on it's own, this is ehh. but we haz ai!, and ai knows to grep through kernel sources and a reasonably fresh index of all lore emails from all mailing lists and to ingest all the data output above before providing analyses like this.

when i set out to do this the prior paragraph was a hunch, but I actually kind of saw things working in that manner across these 3 pr's, which was kinda cool

no benchmark tools (and also, i targeted the wrong branch w/ PR), AI has some comments but I mean, the code looks good, why not: p2dq: prefer idle cores over prev_cpu in select_cpu fast path likewhatevs/scx#14
ai has some benchmark tools, pretty confident no: p2dq: prefer idle cores over prev_cpu in select_cpu fast path likewhatevs/scx#15
ai has full benchmark suite with rsched outputs while they are running -- it spots the intentional bug: p2dq: prefer idle cores over prev_cpu in select_cpu fast path likewhatevs/scx#20
after clarifying the importance of rsched/improving logging: p2dq: prefer idle cores over prev_cpu in select_cpu fast path likewhatevs/scx#21

note -- neither ai nor benchmarks are blocking

downsides

I'm not sure how long getting that AI feedback is going to take wrt/ queue'ing until libbpf/libbpf-rs#1336 is merged.

TL;DR on that PR is builds get 4x (probably closer to 6-8x, i did not test this w/o the "limit parallelism to prevent ooms" fix in place) faster w/ 10x less ram, enabling better resource allocation of the ci stuff such that I can be more confident no queue'ing for comments/feedback as folks iterate (i think).

No periodic tests against for-next for now. PRs are ran against a battery of kernels, including for-next, etc., but I think that libbpf-rs commit or something like it needs to be merged for it to be tenable (wrt/ compute) for me to enable those.

I had to do some log policing wrt/ signal-to-noise. More/less print when things fail and or warn, do not print positive progress info (i.e. "compiled" or "test passed"). Our logs were in the megabytes, most of the information not particularly actionable. This makes them more information dense.

I'm opening this PR now instead of iterating more because I think making this work best requires folks use it/we see where it falls over.

likewhatevs · 2026-02-16T14:55:16Z

ci will fail on this because I can need to enable a runner when this is merged (doing that before or after would break builds), but here is an example PR with an intentional bug/how things would look w/ that after this is merged p2dq: prefer idle cores over prev_cpu in select_cpu fast path likewhatevs/scx#20

likewhatevs · 2026-02-17T04:55:43Z

think i rly like this one: likewhatevs#24 (comment) -- it links to bootlin and has more tasteful emoji and url use.

likewhatevs · 2026-02-18T05:21:48Z

I fed a handful more PRs through this CI setup on my fork here:

https://github.com/likewhatevs/scx/pulls?q=is%3Apr+created%3A2026-02-18T00%3A46%3A34..2026-02-18T04%3A46%3A34

Runtime for non-ai-review jobs is down to 6 minutes (i.e. the blocking merge queue stuff) and that libbpf-cargo should take maybe a 2 minutes off that plus remove a fan-out bottleneck in the merge queue.

This was to be simpler than it ended up being but getting runtimes acceptable (not 30 mins for the blocking stuff) w/o using an external cache w/ that libbpf issue required all the caching. That being said, it's still less code with additional capabilities and all shell or github ci yaml.

It looks like my fix for libbpf might be OK after some iteration, so times will improve.

WRT/ the example PRs, all i've seen so far seem informative but a good few are in-flight. The way the AI review works is it takes ~30 mins from when a PR is opened to comment but that is serial. This sounds bad but, looking at historic data, I'd guess the most anyone would ever have to wait would be an hour or two. Note, AI review is non-blocking so maybe that's fine.

There is a lot of noise on the ai/perf box (i set some probably-should-be weekly index/update cron jobs to be hourly) so it'll be interesting to see what AI makes of that wrt/ interpreting changes in the context of performance.

This is a picture of what the pipeline does (github ui doesn't render fanout w/ matrices well, I think):

I also updated the underlying tooling such that I think we can do all that we do on x86 on arm by just adding another matrix variable (presuming things aren't unusably slow, gh runners don't have kvm on arm).

ci grew in complexity over time. this commit simplifies it while adding some capabilities we have wanted (proper veristat support, claude, automated performance testing and analysis) in the process. also cleanup a lint issue for a linter that wasn't running or something (for green signal). Signed-off-by: Pat Somaru <patso@likewhatevs.io>

likewhatevs requested review from arighi, htejun and multics69 February 16, 2026 14:49

likewhatevs marked this pull request as draft February 17, 2026 17:34

likewhatevs force-pushed the ci-debug branch 13 times, most recently from 9829c80 to 3f03e85 Compare February 18, 2026 04:05

likewhatevs requested a review from hodgesds February 18, 2026 04:47

likewhatevs marked this pull request as ready for review February 18, 2026 05:22

likewhatevs force-pushed the ci-debug branch from 3f03e85 to d4f8c95 Compare February 18, 2026 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rework ci#3330

rework ci#3330
likewhatevs wants to merge 1 commit intosched-ext:mainfrom
likewhatevs:ci-debug

likewhatevs commented Feb 16, 2026 •

edited

Loading

Uh oh!

likewhatevs commented Feb 16, 2026

Uh oh!

likewhatevs commented Feb 17, 2026 •

edited

Loading

Uh oh!

likewhatevs commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

likewhatevs commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Anyway, on the topic of the points of this refactor

simple

veristat

terse/categorized output

AI

benchmarks

downsides

Uh oh!

likewhatevs commented Feb 16, 2026

Uh oh!

likewhatevs commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

likewhatevs commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

likewhatevs commented Feb 16, 2026 •

edited

Loading

likewhatevs commented Feb 17, 2026 •

edited

Loading