Skip to content

Conversation

@benchislett
Copy link
Collaborator

@benchislett benchislett commented Oct 20, 2025

Purpose

maybe_remap_kv_scale_name is a part of the DS weight loading logic, but it missing from MTP. This causes a crash when loading, for example, nvidia/DeepSeek-R1-0528-FP4 with MTP which uses this feature by default.

Test Plan

LM Eval GSM8k of nvidia/DeepSeek-R1-0528-FP4 on 4xB200:

Test Result

local-completions (base_url=http://0.0.0.0:8049/v1/completions,model=nvidia/DeepSeek-R1-0528-FP4,tokenized_requests=False,tokenizer_backend=None,num_concurrent=128,timeout=120,max_retries=5), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9530|±  |0.0058|
|     |       |strict-match    |     5|exact_match|↑  |0.9484|±  |0.0061|

@benchislett benchislett added the bug Something isn't working label Oct 20, 2025
@mergify mergify bot added the deepseek Related to DeepSeek models label Oct 20, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug that caused a crash when loading models with FP8 KV scales using Multi-Token Prediction (MTP). The fix involves incorporating the maybe_remap_kv_scale_name function into the MTP weight loading process, which was previously missing. This change aligns the MTP weight loading logic with the existing implementation for the base model, ensuring that KV scale names are correctly remapped. The implementation is correct and directly resolves the reported issue. The code is clean and there are no further issues.

@aarnphm aarnphm enabled auto-merge (squash) October 20, 2025 21:12
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 20, 2025
@vllm-bot vllm-bot merged commit f381cf2 into vllm-project:main Oct 21, 2025
55 of 58 checks passed
Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 21, 2025
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Oct 21, 2025
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants