-
Notifications
You must be signed in to change notification settings - Fork 454
[long_seq_optim] support cp&sp #2741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> --------- Co-authored-by: zhangsicheng5 <[email protected]>
…nto long_seq_tmp # Conflicts: # vllm_ascend/ascend_config.py # vllm_ascend/attention/mla_v1.py # vllm_ascend/models/deepseek_v2.py # vllm_ascend/ops/rotary_embedding.py # vllm_ascend/worker/model_runner_v1.py
support qwen3-32B sp and cp
【bugfix】128K Long Sequence Freezes in CP&SP Scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for context parallelism (CP) and sequence parallelism (SP) to handle long sequences on Ascend NPUs. The changes are extensive, affecting attention mechanisms, model implementations for Deepseek and Qwen3, the scheduler, and the model runner. While the overall approach seems sound, the review identified several critical bugs related to incorrect tensor dimensioning and indexing in the attention and model runner code, which could lead to runtime errors. Additionally, there are high-severity maintainability issues, including a class name typo, confusing code patterns, and hardcoded paths in the new example script that hinder its usability. These issues should be addressed to ensure correctness and code quality.
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: SunnyLee219 <[email protected]>
Long seq tmp
【bugfix】Fix the bug in CP&SP features when max_num_seqs > 1.
Signed-off-by: Apocalypse990923-qshi <[email protected]>
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
[bug-fix] remove original_len
Signed-off-by: Apocalypse990923-qshi <[email protected]>
ut test mla_v1
Long seq tmp
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: Apocalypse990923-qshi <[email protected]>
[bugfix] fix enable_sp
Signed-off-by: tanwenqin <[email protected]>
Signed-off-by: tanwenqin <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: LookAround <[email protected]>
Signed-off-by: LookAround <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
support cp sp pd disaggregate
Signed-off-by: Delphine-Nic <[email protected]>
Signed-off-by: Delphine-Nic <[email protected]>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?