[XPU] Enable sequence parallel support for XPU by chaojun-zhang · Pull Request #38608 · vllm-project/vllm

chaojun-zhang · 2026-03-31T03:45:32Z

Test Plan

Test Result

Configuration	Command	Median latency
Enable SP	vllm bench latency --model meta-llama/Llama-3.1-8B -tp 2 --compilation-config '{"pass_config": {"enable_sp": true, "sp_min_token_num": 256}}'	2.346s
Disable SP	vllm bench latency --model meta-llama/Llama-3.1-8B -tp 2	2.491s

UT :
pytest -s -v tests/compile/correctness_e2e/test_sequence_parallel.py
pytest -s -v pytest -s -v tests/compile/correctness_e2e/test_sequence_parallel.py

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request enables sequence parallelism tests on XPU platforms by updating pytest markers and generalizing device selection using current_platform.device_type and torch.accelerator. It also moves the SequenceParallelismPass import out of the CUDA-specific guard in the pass manager. Feedback indicates that moving this pass alone is insufficient, as RMSNormQuantFusionPass remains guarded but is required by the newly enabled XPU tests, which will likely result in a NameError.

gemini-code-assist · 2026-03-31T03:47:30Z

vllm/compilation/passes/pass_manager.py

        RocmAiterTritonAddRMSNormPadFusionPass,
    )

+from .fusion.sequence_parallelism import SequenceParallelismPass


Moving SequenceParallelismPass out of the is_cuda_alike() guard is necessary to support it on XPU. However, RMSNormQuantFusionPass (currently at line 32) remains inside the guard. Since the XPU tests added in this PR (tests/compile/passes/distributed/test_sequence_parallelism.py) explicitly enable fuse_norm_quant, the PostGradPassManager.configure() method will raise a NameError on XPU platforms when it attempts to instantiate RMSNormQuantFusionPass.

Additionally, AsyncTPPass (line 39) is guarded by is_cuda(), which will cause a similar NameError if fuse_gemm_comms is enabled on XPU. You should move RMSNormQuantFusionPass out of the guard as well, and ensure AsyncTPPass is handled safely for XPU.

vllm/compilation/passes/pass_manager.py

mergify · 2026-03-31T18:41:31Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaojun-zhang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

yma11 · 2026-04-02T01:23:30Z

@chaojun-zhang Any latency difference w/ and w/o this feature enabled? and also the case with asyncTP enabled.

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

chaojun-zhang · 2026-04-02T02:07:03Z

@chaojun-zhang Any latency difference w/ and w/o this feature enabled? and also the case with asyncTP enabled.

For latency please refer to Test Result in PR description.
We will have a separate draft PR for asyncTP

mergify bot added the intel-gpu Related to Intel GPU label Mar 31, 2026

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

chaojun-zhang force-pushed the seq_parallel branch from 8e50c2c to 0bcadcd Compare March 31, 2026 05:20

chaojun-zhang changed the title ~~[Tests] Update sequence parallelism tests to support XPU~~ [XPU] Enable sequence parallel support for XPU Mar 31, 2026

chaojun-zhang force-pushed the seq_parallel branch from 0bcadcd to 9b5336e Compare March 31, 2026 07:43

jikunshang reviewed Mar 31, 2026

View reviewed changes

vllm/compilation/passes/pass_manager.py Outdated Show resolved Hide resolved

chaojun-zhang force-pushed the seq_parallel branch from 9b5336e to aae4a27 Compare March 31, 2026 13:45

mergify bot added the needs-rebase label Mar 31, 2026

chaojun-zhang force-pushed the seq_parallel branch 2 times, most recently from 751f05a to e31dea0 Compare April 2, 2026 01:33

[XPU] Enable sequence parallelism on XPU

e31dea0

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

mergify bot removed the needs-rebase label Apr 2, 2026

chaojun-zhang marked this pull request as ready for review April 2, 2026 02:16

chaojun-zhang requested review from BoyuanFeng, ProExpertProg, vadiklyutiy, youkaichao and zou3519 as code owners April 2, 2026 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU] Enable sequence parallel support for XPU#38608

[XPU] Enable sequence parallel support for XPU#38608
chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
chaojun-zhang:seq_parallel

chaojun-zhang commented Mar 31, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

Uh oh!

mergify bot commented Mar 31, 2026

Uh oh!

yma11 commented Apr 2, 2026 •

edited

Loading

Uh oh!

chaojun-zhang commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

chaojun-zhang commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Mar 31, 2026

Uh oh!

yma11 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaojun-zhang commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chaojun-zhang commented Mar 31, 2026 •

edited

Loading

yma11 commented Apr 2, 2026 •

edited

Loading