[Disagg][Perf] Use NPU event sync instead of blocking tolist to avoid unintentional copy ops blocking across different NPU streams, improving disagg TTIT/TTFT #2788

jesse996 · 2025-09-05T15:00:44Z

This PR is based on top of vllm-project/vllm#22760

What this PR does / why we need it?

When we copy the sampled valid token ids from device to host, avoid using tolist which would trigger a CUDA wise stream sync if the source is on device. We change it to use non-blocking copy followed by an explicit CUDA event sync.

Does this PR introduce any user-facing change?

How was this patch tested?

Bring up vLLM server

VLLM_USE_V1=1 vllm serve Qwen/Qwen2.5-14B-Instruct --disable-l
og-requests -tp 8 --max-num-seqs 64 --no-enable-prefix-caching --max_num_batched_tokens=8000

Before：

After

As shown in the figure, the TTFT decreased

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@9607d5e

github-actions · 2025-09-05T15:00:53Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a valid performance optimization by replacing a blocking .tolist() call with a non-blocking D2H copy and an NPU event synchronization. This is a good approach to avoid device-wide stalls. However, there is a critical bug in the implementation where the pre-allocated pinned memory tensor is sized incorrectly and uses an undefined attribute, which will cause a runtime error. I've provided a fix for this issue.

vllm_ascend/worker/model_runner_v1.py

Signed-off-by: jesse <[email protected]>

codecov · 2025-09-05T15:31:31Z

Codecov Report

❌ Patch coverage is 95.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.36%. Comparing base (1bbb20e) to head (5be58d5).
⚠️ Report is 21 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/worker/model_runner_v1.py	60.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2788      +/-   ##
==========================================
+ Coverage   74.76%   75.36%   +0.59%     
==========================================
  Files         150      155       +5     
  Lines       20891    21350     +459     
==========================================
+ Hits        15620    16091     +471     
+ Misses       5271     5259      -12

Flag	Coverage Δ
unittests	`75.36% <95.00%> (+0.59%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: jesse <[email protected]>

wangxiyuan · 2025-09-08T08:00:48Z

nice work, can you print the benchmark result with/without this PR to make sure it works as expect?

jesse996 · 2025-09-09T02:56:31Z

nice work, can you print the benchmark result with/without this PR to make sure it works as expect?

added to the beginning

github-actions · 2025-09-11T08:47:35Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: jesse <[email protected]>

github-actions · 2025-09-16T03:08:15Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: jesse <[email protected]>

wangxiyuan · 2025-09-22T06:47:05Z

vllm_ascend/worker/model_runner_v1.py

        return False
+
+    def _to_list(self, sampled_token_ids: torch.Tensor) -> list[list[int]]:
+        # This is a short term mitigation for issue mentioned in


can you rewrite the comment to ascend case?

Signed-off-by: jesse <[email protected]>

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

vllm_ascend/worker/model_runner_v1.py Show resolved Hide resolved

jesse996 force-pushed the event-sync branch from 60adb86 to 29c1ddd Compare September 5, 2025 15:03

use event sync

b6c5ef9

Signed-off-by: jesse <[email protected]>

jesse996 force-pushed the event-sync branch from 29c1ddd to b6c5ef9 Compare September 5, 2025 15:13

jesse996 force-pushed the event-sync branch from 5022ce7 to b6c5ef9 Compare September 5, 2025 17:58

github-actions bot added the module:tests label Sep 5, 2025

jesse996 force-pushed the event-sync branch from 5330571 to b6c5ef9 Compare September 5, 2025 18:07

jesse996 added 3 commits September 6, 2025 02:10

add test

9816a36

Signed-off-by: jesse <[email protected]>

update test

f14a98b

Signed-off-by: jesse <[email protected]>

fix test

beabae4

Signed-off-by: jesse <[email protected]>

jesse996 force-pushed the event-sync branch from caa80f6 to beabae4 Compare September 6, 2025 00:47

jesse996 added 6 commits September 6, 2025 13:03

fix test

3da83fe

Signed-off-by: jesse <[email protected]>

fix test

1695f5f

Signed-off-by: jesse <[email protected]>

fix test

c483b20

Signed-off-by: jesse <[email protected]>

fix test

ed0b72f

Signed-off-by: jesse <[email protected]>

fix test

1f9cb35

Signed-off-by: jesse <[email protected]>

fix test

9c8fb4c

Signed-off-by: jesse <[email protected]>

github-actions bot added the merge-conflicts label Sep 11, 2025

github-actions bot removed the merge-conflicts label Sep 11, 2025

Merge branch 'main' into event-sync

5be58d5

Signed-off-by: jesse <[email protected]>

jesse996 force-pushed the event-sync branch from 6de8951 to 5be58d5 Compare September 11, 2025 11:58

jesse996 added 2 commits September 15, 2025 10:00

update test

598c896

Signed-off-by: jesse <[email protected]>

update test

674be75

Signed-off-by: jesse <[email protected]>

github-actions bot added the merge-conflicts label Sep 16, 2025

Merge branch 'main' into event-sync

4588d12

github-actions bot removed the merge-conflicts label Sep 16, 2025

update test

d81f665

Signed-off-by: jesse <[email protected]>

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 18, 2025

wangxiyuan reviewed Sep 22, 2025

View reviewed changes

update comment

dd4c177

Signed-off-by: jesse <[email protected]>

wangxiyuan approved these changes Sep 24, 2025

View reviewed changes

wangxiyuan merged commit 6995a7b into vllm-project:main Sep 24, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Disagg][Perf] Use NPU event sync instead of blocking tolist to avoid unintentional copy ops blocking across different NPU streams, improving disagg TTIT/TTFT #2788

[Disagg][Perf] Use NPU event sync instead of blocking tolist to avoid unintentional copy ops blocking across different NPU streams, improving disagg TTIT/TTFT #2788

jesse996 commented Sep 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

codecov bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

wangxiyuan commented Sep 8, 2025

Uh oh!

jesse996 commented Sep 9, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

wangxiyuan Sep 22, 2025

Uh oh!

jesse996 Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

[Disagg][Perf] Use NPU event sync instead of blocking tolist to avoid unintentional copy ops blocking across different NPU streams, improving disagg TTIT/TTFT #2788

[Disagg][Perf] Use NPU event sync instead of blocking tolist to avoid unintentional copy ops blocking across different NPU streams, improving disagg TTIT/TTFT #2788

Conversation

jesse996 commented Sep 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Before：

After

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

codecov bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wangxiyuan commented Sep 8, 2025

Uh oh!

jesse996 commented Sep 9, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

wangxiyuan Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

jesse996 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jesse996 commented Sep 5, 2025 •

edited by github-actions bot

Loading

codecov bot commented Sep 5, 2025 •

edited

Loading