Skip to content

scx_layered: Replace task hinting with task local data#2812

Open
ameryhung wants to merge 2 commits intosched-ext:mainfrom
ameryhung:use_tld_hinting
Open

scx_layered: Replace task hinting with task local data#2812
ameryhung wants to merge 2 commits intosched-ext:mainfrom
ameryhung:use_tld_hinting

Conversation

@ameryhung
Copy link

Replace the current hinting mechanism with task local data library for better user space performance and storage layout management.

Currently, scx_layered directly uses BPF task local storage map to pass hints from the user space to the scheduler. This is working fine, but there are also areas that can be improved. First, User space programs need to call the BPF syscall, BPF_MAP_UPDATE_ELEM to update the map, which may limit the use case where hints needs to be passed to the kernel with minimum delay. Secondly, both user space code and the BPF scheduler need to agree on the layout of the map value so that they don't step on other's toes.

Task local data is developed to make hinting faster and simpler to use. It defines an abstract storage type on top of task local storage, and provide simple APIs to access it. By using special field UPTR, the scheduler can access the user space thread-specific hint directly. In addition, the API hides the layout from both user space and BPF users with a key value API so that we can remove the central definition of the map value.

The task local data BPF library is copied from selftests/bpf/progs in the kernel tree. There is also user space API implemented in selftests/bpf/prog_tests/task_local_data.h. To pass the hint from the user space, first define a global key using the following macro:

TLD_DEFINE_KEY(task_hint_key, "task_hint_priority", sizeof(u64));

Then, to get a pointer to the hint specific to the thread:

u64 *task_hint_p = tld_get_data(tld_data_map_fd, task_hint_key);

The pointer will remain valid until the thread exit.

@etsal etsal self-requested a review September 23, 2025 22:24
@etsal
Copy link
Contributor

etsal commented Oct 7, 2025

@ameryhung looks good (the main code is unmodified from the kernel afaict), can you move the header to scheds/include/scx instead? That way we can use it from other schedulers, too. You can change the include to just be <scx/task_local_data.bpf.h> and it should just work.

Replace the current hinting mechanism with task local data library for
better user space performance and storage layout management.

Currently, scx_layered directly uses BPF task local storage map to pass
hints from the user space to the scheduler. This is working fine, but
there are also areas that can be improved. First, User space programs
need to call the BPF syscall, BPF_MAP_UPDATE_ELEM to update the map,
which may limit the use case where hints needs to be passed to the kernel
with minimum delay. Secondly, both user space code and the BPF scheduler
need to agree on the layout of the map value so that they don't step on
other's toes.

Task local data is developed to make hinting faster and simpler to use.
It defines an abstract storage type on top of task local storage, and
provide simple APIs to access it. By using special field UPTR, the
scheduler can access the user space thread-specific hint directly. In
addition, the API hides the layout from both user space and BPF users
with a key value API so that we can remove the central definition of
the map value.

The task local data BPF library is copied from selftests/bpf/progs in
the kernel tree. There is also user space API implemented in
selftests/bpf/prog_tests/task_local_data.h. To pass the hint from the
user space, first define a global key using the following macro:

TLD_DEFINE_KEY(task_hint_key, "task_hint_priority", sizeof(u64));

Then, to get a pointer to the hint specific to the thread:

u64 *task_hint_p = tld_get_data(tld_data_map_fd, task_hint_key);

The pointer will remain valid until the thread exit.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
@ameryhung ameryhung marked this pull request as ready for review February 6, 2026 22:01
@ameryhung
Copy link
Author

Changes since draft:

  1. Move task_local_data.bpf.h to scheds/include/scx
  2. Make sure task local data based hinting don't break backward compatibility

@htejun
Copy link
Contributor

htejun commented Feb 7, 2026

scxcash is using the mechanism and probably should be updated together. cc @kkdwvd

Task local data libaray depends on UPTR support in BPF. To prevent
breaking scx_layered on older kernel versions, use
bpf_local_storage_elem::free_node, a field added in the UPTR patchset
[0], as a guard.

[0] https://lore.kernel.org/bpf/20241023234759.860539-5-martin.lau@linux.dev/

Signed-off-by: Amery Hung <ameryhung@gmail.com>
@ameryhung
Copy link
Author

I updated the guard to check a new struct member introduced in the UPTR patchset.

Not sure how we can keep the hint monitor in scxcash. After switching to task local data, all hint updates will be direct memory write. Any suggestions?

@kkdwvd
Copy link
Contributor

kkdwvd commented Feb 9, 2026

You can leave out scxcash, I moved its hinting tracer to the profiling tool, it is mostly used for experimenting/python analysis on the trace and doing overhead measurements. I will adjust and use the wake up tracepoint to change how we capture the hints for the profiling tool, since we can no longer trace map updates.

Let's talk about testing this change offline and then we can land it and figure out the upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments