[Perf] Move attention update stream out of loop to optimize performance #3848

momo609 · 2025-10-29T02:29:53Z

What this PR does / why we need it?

In the update_*attn_params functions, the torch.npu.stream(update_stream) context manager was previously located inside the for-loop that updates parameters for each layer. This resulted in redundant stream initiations for every layer, adding unnecessary overhead.

This commit refactors the code by moving the stream context manager to wrap the entire for-loop. This ensures that the update stream is initiated only once per function call, rather than for each layer. This change reduces 90us in each decode model.
update stream in every layer:

remove update stream in every layer:

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@83f478b

github-actions · 2025-10-29T02:30:34Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request correctly optimizes the update_attn_params function in vllm_ascend/compilation/acl_graph.py by moving the torch.npu.stream context manager outside the for loop. This change effectively reduces the overhead of stream context switching, leading to a performance improvement. The implementation is correct. For consistency and further optimization, consider applying the same pattern to update_mla_attn_params, update_attn_dcp_pcp_params, and update_mla_attn_dcp_pcp_params in the same file, as they exhibit a similar structure.

whx-sjtu

LGTM

yiz-liu

Please also refactor update_mla_attn_dcp_pcp_params and update_attn_dcp_pcp_params.

momo609 · 2025-10-30T09:01:27Z

Please also refactor update_mla_attn_dcp_pcp_params and update_attn_dcp_pcp_params.
dcp，pcp cannot test now, will add in other pr.

github-actions · 2025-10-31T09:20:48Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: wangxiaoxin-sherie <[email protected]>

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

momo609 force-pushed the fulloptimze branch 6 times, most recently from 6b3c8a6 to 298e13c Compare October 30, 2025 03:32

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Oct 30, 2025

momo609 changed the title ~~optimaze upstream in fullgraph~~ optimize upstream in fullgraph Oct 30, 2025

whx-sjtu approved these changes Oct 30, 2025

View reviewed changes

weijinqian0 approved these changes Oct 30, 2025

View reviewed changes

yiz-liu requested changes Oct 30, 2025

View reviewed changes

yiz-liu changed the title ~~optimize upstream in fullgraph~~ [Perf] Move attention update stream out of loop to optimize performance Oct 30, 2025

momo609 force-pushed the fulloptimze branch 3 times, most recently from 5dcf47b to f7041fb Compare October 30, 2025 11:30

weijinqian0 added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Oct 31, 2025

github-actions bot added the merge-conflicts label Oct 31, 2025

optimaze upstream in fullgraph

ed9d5d7

Signed-off-by: wangxiaoxin-sherie <[email protected]>

momo609 force-pushed the fulloptimze branch from f7041fb to ed9d5d7 Compare October 31, 2025 10:02

github-actions bot removed the merge-conflicts label Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf] Move attention update stream out of loop to optimize performance #3848

[Perf] Move attention update stream out of loop to optimize performance #3848

Uh oh!

momo609 commented Oct 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

whx-sjtu left a comment

Uh oh!

yiz-liu left a comment

Uh oh!

momo609 commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Perf] Move attention update stream out of loop to optimize performance #3848

Are you sure you want to change the base?

[Perf] Move attention update stream out of loop to optimize performance #3848

Uh oh!

Conversation

momo609 commented Oct 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

yiz-liu left a comment

Choose a reason for hiding this comment

Uh oh!

momo609 commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

momo609 commented Oct 29, 2025 •

edited by github-actions bot

Loading