Skip to content

[GPU] Add optimized TopK kernel using radix histogram and bitonic sort.#34539

Open
hyunback wants to merge 1 commit intoopenvinotoolkit:masterfrom
hyunback:arg_max_min_topk_radix
Open

[GPU] Add optimized TopK kernel using radix histogram and bitonic sort.#34539
hyunback wants to merge 1 commit intoopenvinotoolkit:masterfrom
hyunback:arg_max_min_topk_radix

Conversation

@hyunback
Copy link
Contributor

@hyunback hyunback commented Mar 6, 2026

Description of the issue(symptom, root-cause, how it was resolved)

Yolo26 shows poor performance, the main bottleneck is TopK.

  • How it was resolved
    Add a new GPU TopK kernel (arg_max_min_topk_radix) that reduces complexity from O(N·K) to O(N + K·log²K) for f16 inputs with SORT_BY_VALUE mode. Two-level 256-bin radix histogram to find the K-th threshold in O(N), followed by SLM bitonic sort for O(K·log²K) final ordering

Reproduction step and snapshot (if applicable. Do not attach for customer model)

$ ./benchmark_app -m ./yolo26n_int8_openvino_model/yolo26n.xml -d GPU -t 10

Checklist

  • Is it a proper fix?
  • Did you include test case for this fix, if necessary?
  • Did you review existing test that can be extended to cover this scenario? Which test did you review?

Tickets:

Add a new GPU TopK kernel (arg_max_min_topk_radix) that reduces complexity
from O(N·K) to O(N + K·log²K) for f16 inputs with SORT_BY_VALUE mode.

Two-level 256-bin radix histogram to find the K-th threshold in O(N),
followed by SLM bitonic sort for O(K·log²K) final ordering

Signed-off-by: hyunback <hyunback.kim@intel.com>
@hyunback hyunback added the category: GPU OpenVINO GPU plugin label Mar 6, 2026
@hyunback hyunback requested review from a team as code owners March 6, 2026 12:07
@hyunback hyunback added WIP work in progress under_perf_check and removed WIP work in progress labels Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant