Skip to content

support qwen3.5 input layout#190

Open
mayuyuace wants to merge 3 commits intovllm-project:mainfrom
mayuyuace:qiming/qwen3.5
Open

support qwen3.5 input layout#190
mayuyuace wants to merge 3 commits intovllm-project:mainfrom
mayuyuace:qiming/qwen3.5

Conversation

@mayuyuace
Copy link
Collaborator

@mayuyuace mayuyuace commented Mar 12, 2026

Qwen3 Next and Qwen 3.5 have different inputs layouts, modify the kernel to support qwen 3.5.
The UT also is updated.

vllm-xpu PR:
https://github.com/intel-innersource/applications.ai.gpu.vllm-xpu/pull/161

bash run-lm-eval-gsm-vllm-baseline.sh -m Qwen/Qwen3.5-9B,dtype=bfloat16 -b 20 -l 250 -f 5 -t 2

triton gdn:
image

sycl gdn:
image

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Copilot AI review requested due to automatic review settings March 12, 2026 06:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for an alternate QKVZ/BA input layout (Qwen 3.5) to the XPU GDN attention path by threading a reorder_input flag through the Torch op binding, C++ interface, and SYCL kernels, and updating the unit test to validate both layouts.

Changes:

  • Extend gdn_attention Torch op signature + C++ interface to accept reorder_input.
  • Update causal-conv1d (native + XE2 chunk) kernels to interpret inputs in either legacy (Qwen Next) or reordered (Qwen 3.5) layout.
  • Expand the GDN attention unit test to run with reorder_input=True/False and reorder the reference inputs accordingly.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/gdn_attn/test_gdn_attn.py Adds reorder_input parametrization and reference-side input reshaping to validate both layouts.
csrc/xpu/torch_bindings.cpp Updates the Torch library schema for gdn_attention to include reorder_input.
csrc/xpu/ops.h Extends the C++ op declaration with the reorder_input flag.
csrc/xpu/gdn_attn/gdn_attn_interface.cpp Wires reorder_input through to causal-conv1d (native + XE2) launch paths.
csrc/xpu/gdn_attn/causal_conv1d.hpp Adds compile-time specialization for reordered layout and dispatch based on reorder_input.
csrc/xpu/gdn_attn/xe_2/chunk_causal_conv1d_xe2.hpp Adds reordered-layout specialization and runtime dispatch to the XE2 chunk kernels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
ref_ssm_state[state_id],
atol=atol,
rtol=rtol)
if num_actual_tokens != 8192:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add below at the test case entry?

if num_actual_tokens == 8192: 
    pytest.skip("FIXME, skip because of random error")

Copy link
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants