Support multimodal rotary embedding by Dboyqiao · Pull Request #192 · vllm-project/vllm-xpu-kernels

Dboyqiao · 2026-03-12T08:19:19Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

support multimodal rotary embedding used by Qwen3-omni

Test Plan

python -m pytest tests/test_multimodal_rotary_embedding.py -v

Test Result

Pass

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Copilot

Pull request overview

This PR adds a multi-modal rotary embedding (M-RoPE) SYCL kernel for XPU, used by models like Qwen2-VL and Qwen3-omni that partition rotation dimensions across positional axes (e.g., temporal/height/width).

Changes:

New SYCL kernel multimodal_rotary_embedding_kernel that builds a per-token merged cos/sin cache from per-section positions, then delegates to the existing apply_rotary_embedding helper
Torch bindings and op registration for multimodal_rotary_embedding
Comprehensive test file comparing kernel output against a pure-Python reference, including a test that single-section M-RoPE matches standard RoPE

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
csrc/pos_encoding_kernels.cpp	New M-RoPE kernel class and host-side launch function
csrc/torch_bindings.cpp	Op schema registration and XPU dispatch for the new kernel
csrc/ops.h	Declaration of `multimodal_rotary_embedding`
tests/register_ops.py	Python wrapper for the new op
tests/test_multimodal_rotary_embedding.py	Tests with reference implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T08:21:23Z

csrc/pos_encoding_kernels.cpp

+      key.has_value() ? key_hidden_size / head_size : num_heads;
+  TORCH_CHECK(num_heads % num_kv_heads == 0);
+
+  const int rot_dim = cos_sin_cache.size(1);


Missing validation that rot_dim <= MROPE_MAX_ROT_DIM (512). The kernel allocates a fixed-size merged_cache[MROPE_MAX_ROT_DIM] array in private memory, but there's no TORCH_CHECK in call_multimodal_rotary_embedding_kernel to ensure rot_dim doesn't exceed this limit. This could cause a buffer overflow for models with large rotation dimensions. Please add a check like TORCH_CHECK(rot_dim <= vllm::MROPE_MAX_ROT_DIM, ...).

Suggested change

const int rot_dim = cos_sin_cache.size(1);

const int rot_dim = cos_sin_cache.size(1);

TORCH_CHECK(

rot_dim <= vllm::MROPE_MAX_ROT_DIM,

"rot_dim exceeds MROPE_MAX_ROT_DIM=",

vllm::MROPE_MAX_ROT_DIM,

", got rot_dim=",

rot_dim);

Copilot · 2026-03-12T08:21:24Z

csrc/pos_encoding_kernels.cpp

+    scalar_t merged_cache[MROPE_MAX_ROT_DIM];  // [cos | sin], size = rot_dim
+    int cumsum = 0;
+    for (int s = 0; s < num_mrope_sections; ++s) {
+      const int lo = cumsum;
+      const int hi = lo + mrope_section[s];
+      cumsum = hi;
+      const int64_t pos = positions[s * num_tokens + token_idx];
+      const scalar_t* src = cos_sin_cache + pos * rot_dim;
+      for (int r = lo; r < hi; ++r) {
+        merged_cache[r]             = src[r];              // cos slice
+        merged_cache[embed_dim + r] = src[embed_dim + r];  // sin slice
+      }
+    }


The merged_cache array is not zero-initialized, and only the ranges covered by mrope_section entries are populated. If the section values don't sum to exactly embed_dim (= rot_dim / 2), apply_rotary_embedding will read uninitialized values from the gaps. Consider either zero-initializing merged_cache or adding a TORCH_CHECK that the section values sum to embed_dim.

jikunshang

please fix dco and pre-commit.

jikunshang · 2026-03-12T09:02:49Z

csrc/torch_bindings.cpp

+      "                            Tensor!? key, int head_size,"
+      "                            Tensor cos_sin_cache, bool is_neox,"
+      "                            int[] mrope_section) -> ()");
+  ops.impl("multimodal_rotary_embedding", torch::kXPU,


vllm code base doesn't have this op, we'd better move it to csrc/xpu/ folder

Support multimodal rotary embedding

6c4b960

Copilot AI review requested due to automatic review settings March 12, 2026 08:19

Copilot started reviewing on behalf of Dboyqiao March 12, 2026 08:19 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

jikunshang reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multimodal rotary embedding#192

Support multimodal rotary embedding#192
Dboyqiao wants to merge 1 commit intovllm-project:mainfrom
Dboyqiao:dev/zhefengq/qwen3_omni_mrope

Dboyqiao commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

jikunshang left a comment

Uh oh!

jikunshang Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-  const int rot_dim = cos_sin_cache.size(1);
+  const int rot_dim = cos_sin_cache.size(1);
+  TORCH_CHECK(
+      rot_dim <= vllm::MROPE_MAX_ROT_DIM,
+      "rot_dim exceeds MROPE_MAX_ROT_DIM=",
+      vllm::MROPE_MAX_ROT_DIM,
+      ", got rot_dim=",
+      rot_dim);

Conversation

Dboyqiao commented Mar 12, 2026

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jikunshang left a comment

Choose a reason for hiding this comment

Uh oh!

jikunshang Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants