scx_layered: Replace task hinting with task local data#2812
scx_layered: Replace task hinting with task local data#2812ameryhung wants to merge 2 commits intosched-ext:mainfrom
Conversation
|
@ameryhung looks good (the main code is unmodified from the kernel afaict), can you move the header to scheds/include/scx instead? That way we can use it from other schedulers, too. You can change the include to just be <scx/task_local_data.bpf.h> and it should just work. |
Replace the current hinting mechanism with task local data library for better user space performance and storage layout management. Currently, scx_layered directly uses BPF task local storage map to pass hints from the user space to the scheduler. This is working fine, but there are also areas that can be improved. First, User space programs need to call the BPF syscall, BPF_MAP_UPDATE_ELEM to update the map, which may limit the use case where hints needs to be passed to the kernel with minimum delay. Secondly, both user space code and the BPF scheduler need to agree on the layout of the map value so that they don't step on other's toes. Task local data is developed to make hinting faster and simpler to use. It defines an abstract storage type on top of task local storage, and provide simple APIs to access it. By using special field UPTR, the scheduler can access the user space thread-specific hint directly. In addition, the API hides the layout from both user space and BPF users with a key value API so that we can remove the central definition of the map value. The task local data BPF library is copied from selftests/bpf/progs in the kernel tree. There is also user space API implemented in selftests/bpf/prog_tests/task_local_data.h. To pass the hint from the user space, first define a global key using the following macro: TLD_DEFINE_KEY(task_hint_key, "task_hint_priority", sizeof(u64)); Then, to get a pointer to the hint specific to the thread: u64 *task_hint_p = tld_get_data(tld_data_map_fd, task_hint_key); The pointer will remain valid until the thread exit. Signed-off-by: Amery Hung <ameryhung@gmail.com>
1284089 to
5e1f4b6
Compare
|
Changes since draft:
|
|
|
Task local data libaray depends on UPTR support in BPF. To prevent breaking scx_layered on older kernel versions, use bpf_local_storage_elem::free_node, a field added in the UPTR patchset [0], as a guard. [0] https://lore.kernel.org/bpf/20241023234759.860539-5-martin.lau@linux.dev/ Signed-off-by: Amery Hung <ameryhung@gmail.com>
151ac7a to
c09a63f
Compare
|
I updated the guard to check a new struct member introduced in the UPTR patchset. Not sure how we can keep the hint monitor in scxcash. After switching to task local data, all hint updates will be direct memory write. Any suggestions? |
|
You can leave out scxcash, I moved its hinting tracer to the profiling tool, it is mostly used for experimenting/python analysis on the trace and doing overhead measurements. I will adjust and use the wake up tracepoint to change how we capture the hints for the profiling tool, since we can no longer trace map updates. Let's talk about testing this change offline and then we can land it and figure out the upgrade. |
Replace the current hinting mechanism with task local data library for better user space performance and storage layout management.
Currently, scx_layered directly uses BPF task local storage map to pass hints from the user space to the scheduler. This is working fine, but there are also areas that can be improved. First, User space programs need to call the BPF syscall, BPF_MAP_UPDATE_ELEM to update the map, which may limit the use case where hints needs to be passed to the kernel with minimum delay. Secondly, both user space code and the BPF scheduler need to agree on the layout of the map value so that they don't step on other's toes.
Task local data is developed to make hinting faster and simpler to use. It defines an abstract storage type on top of task local storage, and provide simple APIs to access it. By using special field UPTR, the scheduler can access the user space thread-specific hint directly. In addition, the API hides the layout from both user space and BPF users with a key value API so that we can remove the central definition of the map value.
The task local data BPF library is copied from selftests/bpf/progs in the kernel tree. There is also user space API implemented in selftests/bpf/prog_tests/task_local_data.h. To pass the hint from the user space, first define a global key using the following macro:
TLD_DEFINE_KEY(task_hint_key, "task_hint_priority", sizeof(u64));
Then, to get a pointer to the hint specific to the thread:
u64 *task_hint_p = tld_get_data(tld_data_map_fd, task_hint_key);
The pointer will remain valid until the thread exit.