scx_p2dq: Add saturation-aware WAKE_SYNC waker CPU handoff#3139
Open
hodgesds wants to merge 1 commit intosched-ext:mainfrom
Open
scx_p2dq: Add saturation-aware WAKE_SYNC waker CPU handoff#3139hodgesds wants to merge 1 commit intosched-ext:mainfrom
hodgesds wants to merge 1 commit intosched-ext:mainfrom
Conversation
de81335 to
e48d9bd
Compare
Implement a fast-path optimization for WAKE_SYNC wakeups that directly assigns wakees to the waker's CPU when the system has capacity. This provides zero-latency handoff for producer-consumer workloads while gracefully degrading at high utilization. The optimization checks if: 1. System is not saturated (!saturated && !overloaded) 2. Waker CPU is in wakee's affinity mask 3. Waker CPU has no queued work (both local and LLC DSQs empty) When these conditions are met, the wakee inherits the waker's CPU immediately. Performance impact (schbench benchmark on 176 CPU system): - 50-70% load: 47-55x wakeup latency improvement (995μs → 18-21μs) - 80% load: 41x improvement (995μs → 24μs) - 90% load: 18x improvement (995μs → 55μs) - 100% load: No change (gracefully disabled) Pipe workloads (producer-consumer pairs) see even higher trigger rates with up to 174,000 handoffs/sec at 50% load compared to ~1,000/sec for request-response patterns. The optimization is placed early in pick_idle_cpu() to take priority over the prev_cpu sticky path, and only activates when beneficial. At saturation, it automatically disables to avoid overhead and allows normal pick-2 load balancing. Changes: - Add P2DQ_STAT_WAKE_SYNC_WAKER counter to track handoffs - Check both local DSQ and LLC DSQ before handoff (waker consumes from both) - Gate optimization with saturation check - Expose counter in userspace stats Tested with schbench and stress-ng across load levels 50-100%. Signed-off-by: Daniel Hodges <hodgesd@meta.com>
e48d9bd to
3172c6d
Compare
likewhatevs
approved these changes
Dec 20, 2025
Contributor
Author
|
I did some more testing trying to get the pipe based producer consumer (single producer/consumer) working with logic to detect vs multi consumers and it seems to be hardware/workload dependent. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement a fast-path optimization for WAKE_SYNC wakeups that directly assigns wakees to the waker's CPU when the system has capacity. This provides zero-latency handoff for producer-consumer workloads while gracefully degrading at high utilization.
The optimization checks if:
When these conditions are met, the wakee inherits the waker's CPU immediately.
Performance impact (schbench benchmark on 176 CPU system):
Pipe workloads (producer-consumer pairs) see even higher trigger rates with up to 174,000 handoffs/sec at 50% load compared to ~1,000/sec for request-response patterns.
The optimization is placed early in pick_idle_cpu() to take priority over the prev_cpu sticky path, and only activates when beneficial. At saturation, it automatically disables to avoid overhead and allows normal pick-2 load balancing.
Changes:
idle_smtmaskcan_migratechecking LLC min runsTested with schbench and stress-ng across load levels 50-100%.