[Feature] Enable inference support for Deepseekr1-w8a8-MTP #1834

Irving11-BKN · 2025-07-17T01:56:49Z

Support the inference of the Deepseekr1-w8a8-mtp model with statically-quantized shared_head in MTP layers.

cherry-pick from #1584

Signed-off-by: curryliu [email protected]

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@ca4eb82

Signed-off-by: l30074184 <[email protected]>

codecov · 2025-07-17T06:54:25Z

Codecov Report

Attention: Patch coverage is 33.33333% with 12 lines in your changes missing coverage. Please review.

Project coverage is 60.40%. Comparing base (c30ddb8) to head (a67bc73).
Report is 163 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/quantization/quant_config.py	33.33%	6 Missing ⚠️
vllm_ascend/models/deepseek_mtp.py	42.85%	4 Missing ⚠️
vllm_ascend/models/deepseek_v2.py	0.00%	2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (33.33%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1834       +/-   ##
===========================================
+ Coverage   27.39%   60.40%   +33.00%     
===========================================
  Files          56       72       +16     
  Lines        6191     8117     +1926     
===========================================
+ Hits         1696     4903     +3207     
+ Misses       4495     3214     -1281

Flag	Coverage Δ
unittests	`60.40% <33.33%> (+33.00%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: l30074184 <[email protected]>

wangxiyuan · 2025-07-23T01:50:02Z

vllm_ascend/models/deepseek_mtp.py

+        self.head = ParallelLMHead(config.vocab_size,
+                                   config.hidden_size,
+                                   quant_config=quant_config,
+                                   prefix=maybe_prefix(prefix, "head"))


why only we need prefix here？The behavior of w8a8 weight is different with others?

prefix is used for the function is_layer_skipped in quant_config.py file, the behavior is the same

[Feature] Enable inference support for Deepseekr1-w8a8-MTP

da0e2bf

Signed-off-by: l30074184 <[email protected]>

github-actions bot added the module:quantization label Jul 17, 2025

l30074184 added 5 commits July 17, 2025 09:57

codecheck fixed

b44f0c2

Signed-off-by: l30074184 <[email protected]>

codecheck fixed2

2b40113

Signed-off-by: l30074184 <[email protected]>

codecheck fixed2

2cf885e

Signed-off-by: l30074184 <[email protected]>

codecheck fixed3

185fc5f

Signed-off-by: l30074184 <[email protected]>

codecheck fixed4

31008d4

Signed-off-by: l30074184 <[email protected]>

Irving11-BKN force-pushed the main branch 2 times, most recently from 27803cb to a67bc73 Compare July 18, 2025 08:19

trigger CI rerun

a67bc73

Signed-off-by: l30074184 <[email protected]>

wangxiyuan reviewed Jul 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Enable inference support for Deepseekr1-w8a8-MTP #1834

[Feature] Enable inference support for Deepseekr1-w8a8-MTP #1834

Irving11-BKN commented Jul 17, 2025 •

edited by wangxiyuan

Loading

Uh oh!

codecov bot commented Jul 17, 2025 •

edited

Loading

Uh oh!

wangxiyuan Jul 23, 2025

Uh oh!

Irving11-BKN Jul 23, 2025

Uh oh!

Uh oh!

[Feature] Enable inference support for Deepseekr1-w8a8-MTP #1834

Are you sure you want to change the base?

[Feature] Enable inference support for Deepseekr1-w8a8-MTP #1834

Conversation

Irving11-BKN commented Jul 17, 2025 • edited by wangxiyuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wangxiyuan Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Irving11-BKN Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Irving11-BKN commented Jul 17, 2025 •

edited by wangxiyuan

Loading

codecov bot commented Jul 17, 2025 •

edited

Loading