Skip to content

Add Sycl topk per row kernel#191

Open
wuxun-zhang wants to merge 5 commits intovllm-project:mainfrom
wuxun-zhang:dev/ds-v3.2
Open

Add Sycl topk per row kernel#191
wuxun-zhang wants to merge 5 commits intovllm-project:mainfrom
wuxun-zhang:dev/ds-v3.2

Conversation

@wuxun-zhang
Copy link

@wuxun-zhang wuxun-zhang commented Mar 12, 2026

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Add Sycl top_k_per_row kernel for prefill and decode
Copilot adapted most codes from https://github.com/vllm-project/vllm/blob/main/csrc/sampler.cu except subgroup-level prefix sum algorithm since CUDA uses CUB library.

Test Plan

python -m pytest tests/test_topk_per_row.py -v

Test Result

pass

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

wuxun-zhang and others added 4 commits March 12, 2026 06:44
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
Copilot AI review requested due to automatic review settings March 12, 2026 07:10
@wuxun-zhang wuxun-zhang mentioned this pull request Mar 12, 2026
5 tasks
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

import torch

from tests.register_ops import topk_per_row_prefill, topk_per_row_decode
from vllm.platforms import current_platform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not import any vllm code in this repo.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Compare results from CUDA top_k_per_row with torch.topk.
Both results should be sorted and contain the same top-k elements.
"""
num_rows = cuda_indices.shape[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better rename cuda

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed cuda to xpu

Signed-off-by: yangqun <qun.yang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants