[main][bugfix] Unify MoE routing init with standard torch_npu operator #2401

SlightwindSec · 2025-08-16T02:09:41Z

Description

This PR fixes the dependency on a POC version of torch_npu for the MoE routing initialization feature.

Before:
To get the best performance, users needed a torch_npu version containing the npu_moe_init_routing_quant operator. Official versions would trigger a slower, pure PyTorch fallback.

After:
The code is updated to use the npu_moe_init_routing_v2 operator, which is included in the official torch_npu releases and provides equivalent performance. This change unifies the implementation, removes the fallback logic, and makes the high-performance path accessible to all users without requiring a special library version.

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@f6b5040

Signed-off-by: SlightwindSec <[email protected]>

github-actions · 2025-08-16T02:09:49Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the MoE routing initialization for quantized models by replacing a proof-of-concept torch_npu operator (npu_moe_init_routing_quant) with the official npu_moe_init_routing_v2. This is a positive change that removes dependency on a non-standard library version, eliminates fallback logic, and simplifies the codebase. The implementation appears correct, with parameters for the new operator aligned consistently with its usage elsewhere in the project. The code is now more maintainable and robust.

Fix: Relax the torch_npu version dependency for the POC build.

a0f4c1e

Signed-off-by: SlightwindSec <[email protected]>

github-actions bot added the module:quantization label Aug 16, 2025

gemini-code-assist bot reviewed Aug 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main][bugfix] Unify MoE routing init with standard torch_npu operator #2401

[main][bugfix] Unify MoE routing init with standard torch_npu operator #2401

SlightwindSec commented Aug 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

[main][bugfix] Unify MoE routing init with standard torch_npu operator #2401

Are you sure you want to change the base?

[main][bugfix] Unify MoE routing init with standard torch_npu operator #2401

Conversation

SlightwindSec commented Aug 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

SlightwindSec commented Aug 16, 2025 •

edited by github-actions bot

Loading