runtime: steal tasks from the LIFO slot #7431

hawkw · 2025-06-27T22:46:55Z

Motivation

Worker threads in the multi-threaded runtime include a per-worker LIFO
slot which stores the last task notified by another task running on that
worker. This allows the last-notified task to be polled first when the
currently running task completes, decreasing latency in message-passing
"ping-pong" scenarios.

However --- as described in #4941 --- there's an issue with this that
can cause severe problems in some situations: the task in the LIFO slot
cannot be stolen by a work-stealing worker thread. This means that if a
task notifies another task and then goes CPU bound for a long period of
time without yielding, the notified task will never be able to execute
until the task that notified it can yield. This can result in a very
severe latency bubble in some scenarios. See, for instance, #4323,
#6954, oxidecomputer/omicron#8334, etc.

As a workaround, PR #4936 added an unstable runtime::Builder option
to disable the LIFO slot. However, this is a less-than-ideal
solution, as it means that applications which disable the LIFO slot due
to occasional usage patterns that cause latency bubbles when it is
enabled cannot benefit from the potential latency improvements it
offers in other usage patterns. And, it's an unstable option which the
user has to discover. In most cases, people whose programs contain usage
patterns that are pathological with regards to the LIFO slot don't know
this ahead of time: the typical narrative is that you write code that
happens to follow such a pattern, discover an unexpected latency spike
or hang in production, and then learn how to disable the LIFO slot. It
would be much nicer if the task in the LIFO slot could participate in
work-stealing like every other task.

Solution

This branch makes tasks in the LIFO slot stealable.

Broadly, I've taken the following approach:

Add a new AtomicNotified type that implements an atomically
swappable cell storing a Notified task
(1220253), and use it to represent
the LIFO slot instead of an Option<Notified<Arc<Handle>>>. This
way, other workers can take a task out of the LIFO slot while
work-stealing.
Move the LIFO slot out of the worker::Core struct and into the run
queue's Inner type (75d8116),
making it shared state between the Local side of the queue owned by
the worker itself and the Steal side used by remote workers to
steal from the queue.

There's a bunch of additional code in worker::Core for managing
whether to atually run a task from the LIFO slot or not. I opted
not to move any of this code into the run queue itself, as it
depends on other bits of internal worker state. Instead, we just
expose to the worker separate APIs for pushing/popping to/from the
main queue and for pushing/popping to/from the LIFO slot, resulting
in a fairly small diff to the worker's run loop.
Change the work-stealing code to also steal the LIFO task
(730a581 and
cb27dda). This actually turned out
to be pretty straightforwards: once we've stolen a chunk of tasks
from the targeted worker's run queue, we now also grab whatever's in
its' LIFO slot as well. If we stole a LIFO task, it's returned from
the steal_into method in lieu of the first task in the run queue, so
that it gets to execute first, maintaining the latency improvement for
recently-notified tasks. This also was simpler than trying to wedge it
into the chunk of tasks to be pushed to the stealer's queue.

I've made the following test changes while working on this branch:

Added an integration test that reproduces a scenario where a task in
the LIFO slot is blocked from running when a task notifies it and then
blocks indefinitely. I've confirmed that this test fails on master
and passes on this branch.
Added new Loom tests for work-stealing involving the LIFO slot.
These are in addition to the existing work-stealing Loom tests, as
tasks notified by I/O or timers are still woken to the normal run
queue.
A small change to the
rt_unstable_metrics::worker_local_queue_depth integration test,
which was necessary as tasks in the LIFO slot now "count" towards the
worker's queue depth. We now have to make sure the no-op task that's
spawned has completed before spawning the tasks we actually attempt to
count, as it seems to sometimes end up in the LIFO slot and sometimes
not, causing the test to flake out.

Fixes #4941

This commit changes the LIFO slot on multi-threaded runtime workers from being a mutable `Option<Notified<Arc<Handle>>>` to a new `AtomicNotified` type. This type implements a cell containing a nullable task pointer which can be swapped atomically. It's analogous to `AtomicCell` but with the extra `PhantomData` to remember the task's scheduler type parameter, which would otherwise be erased by the conversion into a `*mut Header`` pointer. This change is in preparation for a subsequent change to allow work-stealing from the LIFO slot (see: #4941).

This way, it's accesible by the stealer. Leave all the LIFO *accounting* (i.e. deciding whether we hit the LIFO slot or not) up to the worker. Gotta figure out whether the load of lifo presence will race...ugh.

This commit adds a test ensuring that if a task is notified to the LIFO slot by another task which then blocks the worker thread forever, the LIFO task is eventually stolen by another worker. I've confimed that this test fails on the `master` branch, and passes after these changes.

This test spawns a task that sometimes ends up in the LIFO slot and sometimes doesn't. This was previously fine as the LIFO slot didn't count for `worker_local_queue_depth`, but now it does. Thus, we have to make sure that task no longer exists before asserting about queue depth.

hawkw · 2025-07-01T19:42:15Z

(shoutout to @mkeeter for nerd-sniping me into actually doing this)

ADD-SP · 2025-07-02T01:53:21Z

tokio/src/runtime/scheduler/multi_thread/queue.rs

+        // If we also grabbed the task from the LIFO slot, include that in the
+        // steal count as well.
+        dst_stats.incr_steal_count(n as u16 + lifo.is_some() as u16);


For reviewers: Rust has defined the casting from bool to u16.

It's also possible to use u16::from(bool), if you care about cast_lossless cleanliness.

tokio/src/runtime/scheduler/multi_thread/worker.rs

tokio/src/runtime/scheduler/multi_thread/queue.rs

Co-authored-by: Matt Keeter <[email protected]>

@ADD-SP

as suggested by @ADD-SP in #7431 (comment)

ADD-SP

It's not easy for me to understand the test lifo_stealable, so I've proposed some comments to help with understanding.

I also have a question, did we have some performance regression test that covers the relevant code path? It would be better confirming no significant regression as this PR changes the hot path a lot.

If there is a performance regression, we can discuss if this is acceptable for supporting steal-able LIFO slot.

ADD-SP · 2025-07-03T13:25:11Z

tokio/src/runtime/tests/loom_multi_thread/queue.rs

 }

 #[test]
 fn multi_stealer() {


For reviewers: This test is not changed, except the inner fn steal_tasks(..) -> usize is moved out of this test for reusing in test multi_stealer_lifo.

Yeah, that's correct.

tokio/tests/rt_threaded.rs

@ADD-SP

Thanks @ADD-SP for the suggestions!

hawkw · 2025-07-03T18:03:10Z

@ADD-SP thanks for your suggested comment improvements for the test! I ended up adding a bunch of comments in 0889f08 --- I wanted to also add a higher-level overview at the beginning of the test explaining what the different channels are used for, and I changed the names of some things. Let me know what you think? Hopefully it's a bit easier to understand now.

hawkw · 2025-07-03T18:13:50Z

@ADD-SP:

I also have a question, did we have some performance regression test that covers the relevant code path? It would be better confirming no significant regression as this PR changes the hot path a lot.

If there is a performance regression, we can discuss if this is acceptable for supporting steal-able LIFO slot.

I think the existing rt_multi_threaded.rs benchmarks should exercise the code changed here --- in particular, I think it's the rt_multi_ping_pong benchmark in that file exercises the performance improvement from the LIFO slot optimization, since it represents a situation where two tasks send messages back and forth.

I'll go run these benchmarks and report my results. However, I think it might be worthwhile to also compare performance on an ARM system. I would guess that if there's a noticeable performance regression from changing the LIFO slot from a mutable local pointer to a shared AtomicPointer would be more noticeable with ARM's weaker memory semantics. On x86 systems, changing accesses to the LIFO slot pointer to atomic accesses with AcqRel ordering should be basically the same as a non-atomic access, but these will result in different instructions on an ARM system.

Unfortunately, I don't have easy access to a multi-core ARM machine to run these benchmarks on. If someone else with an ARM machine, like an Apple Silicon Mac, wants to run the benches on ARM, I'd love to see the results.

hawkw · 2025-07-03T18:14:42Z

This FreeBSD CI run failure looks a bit weird --- it looks like maybe something timed out? I'm re-running it.

Darksonn · 2025-07-04T08:27:14Z

tokio/src/runtime/scheduler/multi_thread/worker.rs

+                        .push_back_or_overflow(prev, self, &mut core.stats);
+                    true
+                }
+                None => false,


This boolean indicates whether other worker threads are notified that there is work available to steal. Since the lifo slot is now stealable, it should no longer be false in this scenario.

Is this something we could have caught with a loom test? Maybe put something in the lifo slot and then block the thread, concurrently with another thread doing nothing?

@Darksonn

as per [this comment][1] from @Darksonn, we should notify other workers any time we push to the LIFO slot, as it is stealable now. this simplifies the logic in `schedule_local` a bit, as we can get rid of the `should_notify` bool. [1]: #7431 (comment)

ADD-SP

There is a failed test, could you take a look?

hawkw added 3 commits June 27, 2025 14:15

runtime: start moving LIFO into the run queue

75d8116

This way, it's accesible by the stealer. Leave all the LIFO *accounting* (i.e. deciding whether we hit the LIFO slot or not) up to the worker. Gotta figure out whether the load of lifo presence will race...ugh.

runtime: actually steal LIFO task (stupid version)

730a581

github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt labels Jun 27, 2025

hawkw and others added 3 commits June 27, 2025 15:47

Merge branch 'master' into eliza/lifo-steal

a5a656d

runtime: docs fixes

4c680ae

runtime: add loom tests with lifo slot

9ad2404

hawkw removed the R-loom-current-thread Run loom current-thread tests on this PR label Jun 30, 2025

github-actions bot added the R-loom-current-thread Run loom current-thread tests on this PR label Jun 30, 2025

hawkw force-pushed the eliza/lifo-steal branch from e435d43 to fe53128 Compare June 30, 2025 18:54

hawkw mentioned this pull request Jun 30, 2025

rt: make the LIFO slot in the multi-threaded scheduler stealable #4941

Open

hawkw and others added 5 commits June 30, 2025 12:53

runtime: fix remote queue depth for metrics

86b16c4

Merge branch 'master' into eliza/lifo-steal

9651b82

runtime: fix lifo loom tests not popping from their own lifo slots

0497b31

runtime: fix wrong accounting for LIFO slot in new queue len

cb27dda

hawkw changed the title ~~[WIP] runtime: steal from the LIFO slot~~ runtime: steal tasks from the LIFO slot Jul 1, 2025

hawkw self-assigned this Jul 1, 2025

hawkw added C-enhancement Category: A PR with an enhancement or bugfix. A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Jul 1, 2025

hawkw marked this pull request as ready for review July 1, 2025 19:39

hawkw requested review from ADD-SP, Darksonn and carllerche July 1, 2025 19:39

ADD-SP reviewed Jul 2, 2025

View reviewed changes

mkeeter reviewed Jul 2, 2025

View reviewed changes

tokio/src/runtime/scheduler/multi_thread/queue.rs Outdated Show resolved Hide resolved

hawkw and others added 3 commits July 2, 2025 08:56

Update queue.rs

eba7e1d

Co-authored-by: Matt Keeter <[email protected]>

unbreak return types after applying @mkeeter's suggestion

f5d5766

match on LIFO push result

adf146f

as suggested by @ADD-SP in #7431 (comment)

ADD-SP reviewed Jul 3, 2025

View reviewed changes

tokio/tests/rt_threaded.rs Show resolved Hide resolved

more comments explaining the lifo_stealable test

0889f08

Thanks @ADD-SP for the suggestions!

Darksonn reviewed Jul 4, 2025

View reviewed changes

get rid of should_notify

aff05a9

as per [this comment][1] from @Darksonn, we should notify other workers any time we push to the LIFO slot, as it is stealable now. this simplifies the logic in `schedule_local` a bit, as we can get rid of the `should_notify` bool. [1]: #7431 (comment)

ADD-SP reviewed Jul 6, 2025

View reviewed changes

Darksonn removed the R-loom-multi-thread-alt label Jul 12, 2025

Uh oh!

runtime: steal tasks from the LIFO slot #7431

Are you sure you want to change the base?

runtime: steal tasks from the LIFO slot #7431

Conversation

hawkw commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Uh oh!

hawkw commented Jul 1, 2025

Uh oh!

ADD-SP Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

mkeeter Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ADD-SP left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ADD-SP Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

hawkw Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hawkw commented Jul 3, 2025

Uh oh!

hawkw commented Jul 3, 2025

Uh oh!

hawkw commented Jul 3, 2025

Uh oh!

Darksonn Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Darksonn Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

ADD-SP left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hawkw commented Jun 27, 2025 •

edited

Loading

ADD-SP left a comment •

edited

Loading

Darksonn Jul 4, 2025 •

edited

Loading