Skip to content

Conversation

@momo609
Copy link
Collaborator

@momo609 momo609 commented Oct 29, 2025

What this PR does / why we need it?

In the update_*attn_params functions, the torch.npu.stream(update_stream) context manager was previously located inside the for-loop that updates parameters for each layer. This resulted in redundant stream initiations for every layer, adding unnecessary overhead.

This commit refactors the code by moving the stream context manager to wrap the entire for-loop. This ensures that the update stream is initiated only once per function call, rather than for each layer. This change reduces 90us in each decode model.
update stream in every layer:
image

remove update stream in every layer:
image

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly optimizes the update_attn_params function in vllm_ascend/compilation/acl_graph.py by moving the torch.npu.stream context manager outside the for loop. This change effectively reduces the overhead of stream context switching, leading to a performance improvement. The implementation is correct. For consistency and further optimization, consider applying the same pattern to update_mla_attn_params, update_attn_dcp_pcp_params, and update_mla_attn_dcp_pcp_params in the same file, as they exhibit a similar structure.

@momo609 momo609 force-pushed the fulloptimze branch 6 times, most recently from 6b3c8a6 to 298e13c Compare October 30, 2025 03:32
@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Oct 30, 2025
@momo609 momo609 changed the title optimaze upstream in fullgraph optimize upstream in fullgraph Oct 30, 2025
Copy link
Collaborator

@whx-sjtu whx-sjtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@yiz-liu yiz-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also refactor update_mla_attn_dcp_pcp_params and update_attn_dcp_pcp_params.

@yiz-liu yiz-liu changed the title optimize upstream in fullgraph [Perf] Move attention update stream out of loop to optimize performance Oct 30, 2025
@momo609
Copy link
Collaborator Author

momo609 commented Oct 30, 2025

Please also refactor update_mla_attn_dcp_pcp_params and update_attn_dcp_pcp_params.
dcp,pcp cannot test now, will add in other pr.

@momo609 momo609 force-pushed the fulloptimze branch 3 times, most recently from 5dcf47b to f7041fb Compare October 30, 2025 11:30
@weijinqian0 weijinqian0 added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Oct 31, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: wangxiaoxin-sherie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants