mitosis: dynamic cpus for affinitized tasks#3206
mitosis: dynamic cpus for affinitized tasks#3206dschatzberg wants to merge 1 commit intosched-ext:mainfrom
Conversation
Tasks that are affinitized (e.g. to a subset of their cell) are presently simply bound to a single cpu out of their affinity mask. This mostly works since most affinitized tasks are just affinitized to a single cpu. We've observed instances where tasks are affinitized to multiple cpus and the initialization-time binding can lead to significant cpu imbalance. This commit adds a new flag --dynamic-affinity-cpu-selection which will dynamically re-assign such tasks (only affinitized tasks) to random cpus within their affinity mask in order to reduce load imbalance. This doesn't completely eliminate cpu load imbalance for these kinds of tasks but avoids load-balancing or other complexities and ultimately prevents long-term imbalance due to making random decisions at each scheduling point. Additionally I've added a test script to reproduce this behavior (may take a few runs due to the random assignment). Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
2253df3 to
22b3191
Compare
There was a problem hiding this comment.
Goal is to improve load balancing for affinitized tasks after observing workloads that pin to multiple cpus.
Changes are in mitosis_select_cpu and mitosis_enqueue.
Select now spreads work potentially across all cpus the thread is allowed on.
mitosis_select_cpu used to throw affinitized task on whatever cpu dsq was chosen in update_task_cpumask. Now when dynamic_affinity_cpu_selection is enabled, it
- tries for an idle cpu in the task's cpuset
- failing that it grabs a random cpu from the set
- when moved to a new cpu, updates task vtime to the new cpu dsq's vtime.
Enqueue randomly redistributes the task when the selected cpu is busy and also has a queued task.
likewhatevs
left a comment
There was a problem hiding this comment.
LGTM, couple of minor maybe-perf-relevant things.
| affn_done: | ||
|
|
||
| if (scx_bpf_test_and_clear_cpu_idle(cpu)) | ||
| scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, slice_ns, 0); |
There was a problem hiding this comment.
should affn_done be here instead? all paths in pick_idle_cpu_from clear idle/claim cpu.
There was a problem hiding this comment.
Hmm, good catch - I wonder if we even need this conditional with the new logic. Let me study it a bit more closely
| if (!tctx->all_cell_cpus_allowed) { | ||
| cstat_inc(CSTAT_AFFN_VIOL, tctx->cell, cctx); | ||
| cpu = dsq_to_cpu(tctx->dsq); | ||
|
|
There was a problem hiding this comment.
From a performance standpoint, might it make sense to have a shortcut case for when the task only runs on a single CPU?
|
linking this because the idea here is probably worth considering wrt/ handling of affine tasks: 86b36ec in a sentence, it is "treat tasks with affinities greater than 1 as if they have the affinity of the cell". the reasoning is "i think there's a rapid roll-off wrt/ stalls wrt/ num eligible cpus wrt/ workloads and stall handling can get involved w/ more complex approaches". think definitely carries a lot of water in that last sentence, but yeah idk just a thing worth considering. |
Tasks that are affinitized (e.g. to a subset of their cell) are presently simply bound to a single cpu out of their affinity mask. This mostly works since most affinitized tasks are just affinitized to a single cpu. We've observed instances where tasks are affinitized to multiple cpus and the initialization-time binding can lead to significant cpu imbalance.
This commit adds a new flag --dynamic-affinity-cpu-selection which will dynamically re-assign such tasks (only affinitized tasks) to random cpus within their affinity mask in order to reduce load imbalance. This doesn't completely eliminate cpu load imbalance for these kinds of tasks but avoids load-balancing or other complexities and ultimately prevents long-term imbalance due to making random decisions at each scheduling point.
Additionally I've added a test script to reproduce this behavior (may take a few runs due to the random assignment).