[long_seq_optim] support cp&sp #2741

LookAround0301 · 2025-09-04T04:52:23Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@3642909

### What this PR does / why we need it?  ### Does this PR introduce _any_ user-facing change?  ### How was this patch tested?  --------- Co-authored-by: zhangsicheng5 <[email protected]>

…nto long_seq_tmp # Conflicts: # vllm_ascend/ascend_config.py # vllm_ascend/attention/mla_v1.py # vllm_ascend/models/deepseek_v2.py # vllm_ascend/ops/rotary_embedding.py # vllm_ascend/worker/model_runner_v1.py

support qwen3-32B sp and cp

【bugfix】128K Long Sequence Freezes in CP&SP Scenario

gemini-code-assist

Code Review

This pull request introduces support for context parallelism (CP) and sequence parallelism (SP) to handle long sequences on Ascend NPUs. The changes are extensive, affecting attention mechanisms, model implementations for Deepseek and Qwen3, the scheduler, and the model runner. While the overall approach seems sound, the review identified several critical bugs related to incorrect tensor dimensioning and indexing in the attention and model runner code, which could lead to runtime errors. Additionally, there are high-severity maintainability issues, including a class name typo, confusing code patterns, and hardcoded paths in the new example script that hinder its usability. These issues should be addressed to ensure correctness and code quality.

vllm_ascend/attention/attention_v1.py

vllm_ascend/attention/mla_v1.py

vllm_ascend/worker/model_runner_v1.py

examples/offline_inference_npu_128k.py

vllm_ascend/attention/attention_v1.py

vllm_ascend/ops/fused_moe.py

vllm_ascend/worker/model_runner_v1.py

github-actions · 2025-09-04T05:01:31Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: SunnyLee219 <[email protected]>

Long seq tmp

【bugfix】Fix the bug in CP&SP features when max_num_seqs > 1.

Signed-off-by: Apocalypse990923-qshi <[email protected]>

github-actions · 2025-09-07T02:37:47Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

[bug-fix] remove original_len

cleancode

vllm_ascend/models/qwen3_moe.py

Signed-off-by: Apocalypse990923-qshi <[email protected]>

ut test mla_v1

Long seq tmp

Signed-off-by: weiguihua2 <[email protected]>

Signed-off-by: Apocalypse990923-qshi <[email protected]>

[bugfix] fix enable_sp

Signed-off-by: tanwenqin <[email protected]>

cleancode

Signed-off-by: tanwenqin <[email protected]>

cleancode

Signed-off-by: weiguihua2 <[email protected]>

Signed-off-by: LookAround <[email protected]>

Signed-off-by: weiguihua2 <[email protected]>

github-actions · 2025-09-15T17:18:25Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: LookAround <[email protected]>

Signed-off-by: weiguihua2 <[email protected]>

support cp sp pd disaggregate

Signed-off-by: Delphine-Nic <[email protected]>

LookAround0301 and others added 12 commits September 1, 2025 09:41

Merge remote-tracking branch 'refs/remotes/origin_vllm-ascend/main' i…

5f4e17e

…nto long_seq_tmp # Conflicts: # vllm_ascend/ascend_config.py # vllm_ascend/attention/mla_v1.py # vllm_ascend/models/deepseek_v2.py # vllm_ascend/ops/rotary_embedding.py # vllm_ascend/worker/model_runner_v1.py

[long_seq_optim] update mla

bb0ab43

support qwen3-32B sp and cp

baa9a75

support qwen3-32B sp and cp

11de0eb

support qwen3-32B sp and cp

51f03ff

support qwen3-32B sp and cp

ff03287

[long_seq_optim] support cp&sp

5cba09d

Merge pull request #2 from Delphine-Nic/long_seq_tmp

00e1c14

support qwen3-32B sp and cp

[long_seq_optim] fix sp bug

5fe2d7d

【bugfix】128K Long Sequence Freezes in CP&SP Scenario

6e9b4d1

Merge pull request #3 from Delphine-Nic/long_seq_tmp

ae2438b

【bugfix】128K Long Sequence Freezes in CP&SP Scenario

gemini-code-assist bot reviewed Sep 4, 2025

View reviewed changes

github-actions bot added module:ops module:core labels Sep 4, 2025

LookAround0301 and others added 6 commits September 4, 2025 14:54

[long_seq_optim] clean code

64c8ca6

fix verify rules

91b22c9

Signed-off-by: SunnyLee219 <[email protected]>

Merge pull request #1 from LookAround0301/long_seq_tmp

5f60018

Long seq tmp

【bugfix】Fix the bug in CP&SP features when max_num_seqs > 1.

98ac745

Merge pull request #4 from Delphine-Nic/long_seq_tmp

00163ca

【bugfix】Fix the bug in CP&SP features when max_num_seqs > 1.

[bug-fix] remove original_len

44edeec

Signed-off-by: Apocalypse990923-qshi <[email protected]>

github-actions bot added the merge-conflicts label Sep 7, 2025

LookAround0301 added 6 commits September 8, 2025 19:24

Merge pull request #7 from Apocalypse990923-qshi/long_seq_tmp

abcb752

[bug-fix] remove original_len

[long_seq_optim] deepseek wordembedding remove sp

b9d52c2

[long_seq_optim] deepseek remove all_gather

47c3c8e

[long_seq_optim] deepseek replace RowParallelLinear

a02f8ad

[long_seq_optim] bug fix & remove some enable sp

158be21

[long_seq_optim] deepseek remove all enable sp

4edb07f

Merge pull request #12 from Delphine-Nic/long_seq_tmp

0b5db1a

cleancode

momo609 reviewed Sep 15, 2025

View reviewed changes

vllm_ascend/models/qwen3_moe.py Show resolved Hide resolved

Apocalypse990923-qshi and others added 17 commits September 15, 2025 16:38

mla_v1 ut test add forward_decode_sp

ad6563d

Signed-off-by: Apocalypse990923-qshi <[email protected]>

Merge pull request #9 from Apocalypse990923-qshi/long_seq_tmp

519cd75

ut test mla_v1

Merge pull request #7 from LookAround0301/long_seq_tmp

3ecba13

Long seq tmp

cleancode

adc4b7f

Signed-off-by: weiguihua2 <[email protected]>

[bugfix] fix enable_sp

d9cba36

Signed-off-by: Apocalypse990923-qshi <[email protected]>

Merge pull request #13 from Apocalypse990923-qshi/long_seq_tmp

2a18b60

[bugfix] fix enable_sp

cleancode

f117ada

Signed-off-by: tanwenqin <[email protected]>

Merge pull request #14 from Delphine-Nic/long_seq_tmp

572e2af

cleancode

cleancode

f3ef5cf

Signed-off-by: tanwenqin <[email protected]>

Merge pull request #15 from Delphine-Nic/long_seq_tmp

7a5d54f

cleancode

cleancode

b63bd9d

Signed-off-by: weiguihua2 <[email protected]>

[long_seq_optim] clean code

0035bbc

Signed-off-by: LookAround <[email protected]>

[long_seq_optim] clean code

4d8864c

Signed-off-by: LookAround <[email protected]>

clean code

42b2d7b

Signed-off-by: weiguihua2 <[email protected]>

clean code

11404e2

Signed-off-by: weiguihua2 <[email protected]>

add example

d86963e

Signed-off-by: weiguihua2 <[email protected]>

【bugfix】Change the flashcomm switch type

815121b

github-actions bot added the merge-conflicts label Sep 15, 2025

tanwenqin and others added 9 commits September 16, 2025 08:47

【bugfix】Change the flashcomm switch type

adcf83d

[long_seq_optim] add env for cp&sp

b53d858

[long_seq_optim] modify mla op

0a14bf3

Signed-off-by: LookAround <[email protected]>

[long_seq_optim] fix 128k bug

cc74e96

Signed-off-by: LookAround <[email protected]>

fix tuple error

8ecf93a

Signed-off-by: weiguihua2 <[email protected]>

support cp sp pd disaggregate

48e4456

Merge pull request #26 from zhangsicheng5/long_seq_tmp

68408bd

support cp sp pd disaggregate

bugfix: Qwen3-moe support sp

11191fb

Signed-off-by: Delphine-Nic <[email protected]>

bugfix: add import

2b62cd9

Signed-off-by: Delphine-Nic <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[long_seq_optim] support cp&sp #2741

[long_seq_optim] support cp&sp #2741

LookAround0301 commented Sep 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

github-actions bot commented Sep 7, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

Uh oh!

[long_seq_optim] support cp&sp #2741

Are you sure you want to change the base?

[long_seq_optim] support cp&sp #2741

Conversation

LookAround0301 commented Sep 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

github-actions bot commented Sep 7, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

Uh oh!

LookAround0301 commented Sep 4, 2025 •

edited by github-actions bot

Loading