Skip to content

[XPU] Enable sequence parallel support for XPU#38608

Open
chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
chaojun-zhang:seq_parallel
Open

[XPU] Enable sequence parallel support for XPU#38608
chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
chaojun-zhang:seq_parallel

Conversation

@chaojun-zhang
Copy link
Copy Markdown
Contributor

@chaojun-zhang chaojun-zhang commented Mar 31, 2026

Test Plan

Test Result

Configuration Command Median latency
Enable  SP vllm bench latency --model meta-llama/Llama-3.1-8B -tp 2 --compilation-config '{"pass_config": {"enable_sp": true, "sp_min_token_num": 256}}' 2.346s
Disable SP vllm bench latency --model meta-llama/Llama-3.1-8B -tp 2 2.491s

UT :
pytest -s -v tests/compile/correctness_e2e/test_sequence_parallel.py
pytest -s -v pytest -s -v tests/compile/correctness_e2e/test_sequence_parallel.py


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the intel-gpu Related to Intel GPU label Mar 31, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables sequence parallelism tests on XPU platforms by updating pytest markers and generalizing device selection using current_platform.device_type and torch.accelerator. It also moves the SequenceParallelismPass import out of the CUDA-specific guard in the pass manager. Feedback indicates that moving this pass alone is insufficient, as RMSNormQuantFusionPass remains guarded but is required by the newly enabled XPU tests, which will likely result in a NameError.

RocmAiterTritonAddRMSNormPadFusionPass,
)

from .fusion.sequence_parallelism import SequenceParallelismPass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Moving SequenceParallelismPass out of the is_cuda_alike() guard is necessary to support it on XPU. However, RMSNormQuantFusionPass (currently at line 32) remains inside the guard. Since the XPU tests added in this PR (tests/compile/passes/distributed/test_sequence_parallelism.py) explicitly enable fuse_norm_quant, the PostGradPassManager.configure() method will raise a NameError on XPU platforms when it attempts to instantiate RMSNormQuantFusionPass.

Additionally, AsyncTPPass (line 39) is guarded by is_cuda(), which will cause a similar NameError if fuse_gemm_comms is enabled on XPU. You should move RMSNormQuantFusionPass out of the guard as well, and ensure AsyncTPPass is handled safely for XPU.

@chaojun-zhang chaojun-zhang changed the title [Tests] Update sequence parallelism tests to support XPU [XPU] Enable sequence parallel support for XPU Mar 31, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 31, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaojun-zhang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 31, 2026
@yma11
Copy link
Copy Markdown
Contributor

yma11 commented Apr 2, 2026

@chaojun-zhang Any latency difference w/ and w/o this feature enabled? and also the case with asyncTP enabled.

@chaojun-zhang chaojun-zhang force-pushed the seq_parallel branch 2 times, most recently from 751f05a to e31dea0 Compare April 2, 2026 01:33
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
@mergify mergify bot removed the needs-rebase label Apr 2, 2026
@chaojun-zhang
Copy link
Copy Markdown
Contributor Author

@chaojun-zhang Any latency difference w/ and w/o this feature enabled? and also the case with asyncTP enabled.

  • For latency please refer to Test Result in PR description.
  • We will have a separate draft PR for asyncTP

@chaojun-zhang chaojun-zhang marked this pull request as ready for review April 2, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

intel-gpu Related to Intel GPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants