-
-
Couldn't load subscription status.
- Fork 10.8k
[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP #27563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP #27563
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR addresses a critical bug in the DeepSeek-MTP model loading process when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS is enabled, which caused a KeyError due to missing parameters. The fix involves adjusting the expert parameters mapping and adding conditional logic to handle the fused shared experts layer. The changes ensure correct weight loading and proper model initialization, as validated by benchmark metrics and correctness tests.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # Use expert_params_mapping to locate the destination | ||
| # param and delegate to its expert-aware weight_loader | ||
| # with expert_id. | ||
| for mapping in expert_params_mapping: | ||
| param_name, weight_name, expert_id, shard_id = mapping | ||
| if weight_name not in chunk_name: | ||
| continue | ||
|
|
||
| # Do not modify `name` since the loop may continue here | ||
| # Instead, create a new variable | ||
| name_mapped = chunk_name.replace(weight_name, param_name) | ||
|
|
||
| param = params_dict[name_mapped] | ||
| # We should ask the weight loader to return success or | ||
| # not here since otherwise we may skip experts with | ||
| # other available replicas. | ||
| weight_loader = typing.cast( | ||
| Callable[..., bool], param.weight_loader | ||
| ) | ||
| success = weight_loader( | ||
| param, | ||
| weight_to_load, | ||
| name_mapped, | ||
| shard_id=shard_id, | ||
| expert_id=expert_id, | ||
| return_success=True, | ||
| ) | ||
| if success: | ||
| if not is_fuse_shared_experts_layer: | ||
| name = name_mapped | ||
| else: | ||
| loaded_params.add(name_mapped) | ||
| break | ||
| else: | ||
| # Skip loading extra bias for GPTQ models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip remote experts instead of falling back to generic loader
The new loop only breaks when param.weight_loader(..., return_success=True) succeeds, but weight_loader returns False for experts that are not hosted on the current rank (the common case in distributed MoE loading). When this happens the for loop completes without a break, so the else clause executes and calls weight_loader(param, loaded_weight) without shard_id/expert_id. That generic path either raises TypeError or copies remote expert tensors into the wrong parameter, causing multi-rank DeepSeek checkpoints to fail. Previously the loop always broke after invoking weight_loader, allowing it to silently skip remote experts. The fallback should only trigger when the weight name does not correspond to an expert at all, not when the expert is simply non-local.
Useful? React with 👍 / 👎.
Purpose
This PR aims to fix the loading errors for the DeepSeek MTP weights when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS is enabled (which is the default setting).
The issue occurs during model loading where a KeyError is thrown for the parameter 'model.layers.61.mtp_block.mlp.shared_experts.down_proj.weight_scale_inv'.
Root Cause: The issue was introduced by PR #24097 which added fused shared experts optimization for ROCm but did not properly adapt it for the DeepSeek MTP model architecture. This causes a KeyError during weight loading when the shared_experts parameter is missing for shared experts in MTP blocks.
The repair method refers to the changes made to vllm/model_executor/models/deepseek_v2.py in this PR: #24097
Test Plan
The following tests validate DeepSeek models by collecting benchmark metrics and performning correctness tests through lm_eval.
vLLM server launch command:
lm_eval command:
Test Result
berfor this PR,
after this PR, The service can start normally, the MTP weights are loaded properly, and the results of the gsm8k test and mtp model acceptance rate are as follows.