Skip to content

Conversation

LookAround0301
Copy link
Contributor

@LookAround0301 LookAround0301 commented Sep 4, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

LookAround0301 and others added 12 commits September 1, 2025 09:41
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Co-authored-by: zhangsicheng5 <[email protected]>
…nto long_seq_tmp

# Conflicts:
#	vllm_ascend/ascend_config.py
#	vllm_ascend/attention/mla_v1.py
#	vllm_ascend/models/deepseek_v2.py
#	vllm_ascend/ops/rotary_embedding.py
#	vllm_ascend/worker/model_runner_v1.py
【bugfix】128K Long Sequence Freezes in CP&SP Scenario
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for context parallelism (CP) and sequence parallelism (SP) to handle long sequences on Ascend NPUs. The changes are extensive, affecting attention mechanisms, model implementations for Deepseek and Qwen3, the scheduler, and the model runner. While the overall approach seems sound, the review identified several critical bugs related to incorrect tensor dimensioning and indexing in the attention and model runner code, which could lead to runtime errors. Additionally, there are high-severity maintainability issues, including a class name typo, confusing code patterns, and hardcoded paths in the new example script that hinder its usability. These issues should be addressed to ensure correctness and code quality.

Copy link

github-actions bot commented Sep 4, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link

github-actions bot commented Sep 7, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants