Skip to content

[main][bugfix] Unify MoE routing init with standard torch_npu operator #2401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SlightwindSec
Copy link
Contributor

@SlightwindSec SlightwindSec commented Aug 16, 2025

Description

This PR fixes the dependency on a POC version of torch_npu for the MoE routing initialization feature.

Before:
To get the best performance, users needed a torch_npu version containing the npu_moe_init_routing_quant operator. Official versions would trigger a slower, pure PyTorch fallback.

After:
The code is updated to use the npu_moe_init_routing_v2 operator, which is included in the official torch_npu releases and provides equivalent performance. This change unifies the implementation, removes the fallback logic, and makes the high-performance path accessible to all users without requiring a special library version.

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MoE routing initialization for quantized models by replacing a proof-of-concept torch_npu operator (npu_moe_init_routing_quant) with the official npu_moe_init_routing_v2. This is a positive change that removes dependency on a non-standard library version, eliminates fallback logic, and simplifies the codebase. The implementation appears correct, with parameters for the new operator aligned consistently with its usage elsewhere in the project. The code is now more maintainable and robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant