[BugFix]fixed rm_router_logits_allgather_ep bug #1817

ttanzhiqiang · 2025-07-15T14:23:33Z

There are four situations in the previous logic

If Prefill/decode uses AllGather or NaiveMulticast at the same time, this logic is normal, and this solution is used for optimization
If Prefill/decode uses All2All/MC2 at the same time, this logic is also normal, and this solution is not used for optimization
Prefill uses AllGatherEP (using VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP switch) and Decode uses MC2, which will affect the results. There is a bug in this place
In the PD separation scenario, the strategies used by P and D are separate, so there is no impact.

rm_router_logits optimization scheme, AllGather/NaiveMulticast/All2All/MC2 are all used

If Prefill/decode use AllGather or NaiveMulticast scheme at the same time, this logic is normal, and this scheme is used for optimization
If Prefill/decode use All2All/MC2 scheme at the same time, this logic is also normal, and this scheme is used for optimization
Prefill uses AllGatherEP scheme (use VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP switch), Decode uses MC2 scheme, and this scheme is used for optimization
In the PD separation scenario, the strategies used by P and D are separate, and this scheme is used for optimization.

How was this patch tested?

Test method for case 1

export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
export ASCEND_LAUNCH_BLOCKING=0
export VLLM_VERSION=0.9.1
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--served-model-name auto
--quantization ascend
--trust-remote-code
--distributed-executor-backend=mp
--port 8006
-tp=4
-dp=4
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--no-enable-prefix-caching
--additional-config '{"torchair_graph_config":{"enabled":true,"use_cached_graph":true,"graph_batch_sizes":[24]},"ascend_scheduler_config":{"enabled":true},"expert_tensor_parallel_size":16}'
--gpu-memory-utilization 0.96 &> run.log &
disown

Test method for case 2

export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
export ASCEND_LAUNCH_BLOCKING=0
export VLLM_VERSION=0.9.1
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--served-model-name auto
--quantization ascend
--trust-remote-code
--distributed-executor-backend=mp
--port 8006
-tp=4
-dp=4
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--no-enable-prefix-caching
--enable_expert_parallel
--additional-config '{"torchair_graph_config":{"enabled":true,"use_cached_graph":true,"graph_batch_sizes":[24]},"ascend_scheduler_config":{"enabled":true},"expert_tensor_parallel_size":16}'
--gpu-memory-utilization 0.96 &> run.log &
disown

Test method for case 3

export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
export ASCEND_LAUNCH_BLOCKING=0
export VLLM_VERSION=0.9.1
export VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP=1
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--served-model-name auto
--quantization ascend
--trust-remote-code
--distributed-executor-backend=mp
--port 8006
-tp=4
-dp=4
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--no-enable-prefix-caching
--enable_expert_parallel
--additional-config '{"torchair_graph_config":{"enabled":true,"use_cached_graph":true,"graph_batch_sizes":[24]},"ascend_scheduler_config":{"enabled":true},"expert_tensor_parallel_size":16}'
--gpu-memory-utilization 0.96 &> run.log &
disown

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@e7e3e6d

Signed-off-by: ttanzhiqiang <[email protected]>

codecov · 2025-07-15T15:14:10Z

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 53.49%. Comparing base (f96100f) to head (d2d9ee4).

Files with missing lines	Patch %	Lines
vllm_ascend/ops/fused_moe.py	0.00%	7 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1817      +/-   ##
==========================================
- Coverage   53.51%   53.49%   -0.02%     
==========================================
  Files          77       77              
  Lines        9435     9438       +3     
==========================================
  Hits         5049     5049              
- Misses       4386     4389       +3

Flag	Coverage Δ
unittests	`53.49% <0.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-07-28T06:08:45Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-08-19T02:13:49Z

please rebase to fix the merge conflict if this PR is still needed.

momo609 · 2025-09-14T07:05:12Z

vllm_ascend/ops/fused_moe.py

            hidden_states = chunk_hidden_states[tp_rank]
-            router_logits = chunk_router_logits[tp_rank]
+            if not self.rm_router_logits:
+                if num_tokens < tp_size:


This f statement's condition check can be directly merged with the padding check for num_token above.

fixed rm_router_logits_allgather_ep bug

b812e58

Signed-off-by: ttanzhiqiang <[email protected]>

github-actions bot added module:ops module:core labels Jul 15, 2025

ttanzhiqiang changed the title ~~fixed rm_router_logits_allgather_ep bug~~ [BugFix]fixed rm_router_logits_allgather_ep bug Jul 15, 2025

fixed rm_router_logits_allgather_ep bug

d2d9ee4

Signed-off-by: ttanzhiqiang <[email protected]>

github-actions bot added the merge-conflicts label Jul 28, 2025

momo609 reviewed Sep 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix]fixed rm_router_logits_allgather_ep bug #1817

[BugFix]fixed rm_router_logits_allgather_ep bug #1817

ttanzhiqiang commented Jul 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

codecov bot commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

wangxiyuan commented Aug 19, 2025

Uh oh!

momo609 Sep 14, 2025

Uh oh!

Uh oh!

[BugFix]fixed rm_router_logits_allgather_ep bug #1817

Are you sure you want to change the base?

[BugFix]fixed rm_router_logits_allgather_ep bug #1817

Conversation

ttanzhiqiang commented Jul 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How was this patch tested?

Test method for case 1

Test method for case 2

Test method for case 3

Uh oh!

codecov bot commented Jul 15, 2025

Codecov Report

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

wangxiyuan commented Aug 19, 2025

Uh oh!

momo609 Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ttanzhiqiang commented Jul 15, 2025 •

edited by github-actions bot

Loading