-
Notifications
You must be signed in to change notification settings - Fork 277
[Feature] Enable inference support for Deepseekr1-w8a8-MTP #1834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: l30074184 <[email protected]>
Signed-off-by: l30074184 <[email protected]>
Signed-off-by: l30074184 <[email protected]>
Signed-off-by: l30074184 <[email protected]>
Signed-off-by: l30074184 <[email protected]>
Signed-off-by: l30074184 <[email protected]>
Codecov ReportAttention: Patch coverage is
❌ Your patch check has failed because the patch coverage (33.33%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1834 +/- ##
===========================================
+ Coverage 27.39% 60.40% +33.00%
===========================================
Files 56 72 +16
Lines 6191 8117 +1926
===========================================
+ Hits 1696 4903 +3207
+ Misses 4495 3214 -1281
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
27803cb
to
a67bc73
Compare
Signed-off-by: l30074184 <[email protected]>
self.head = ParallelLMHead(config.vocab_size, | ||
config.hidden_size, | ||
quant_config=quant_config, | ||
prefix=maybe_prefix(prefix, "head")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only we need prefix
here?The behavior of w8a8 weight is different with others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix is used for the function is_layer_skipped in quant_config.py file, the behavior is the same
Support the inference of the Deepseekr1-w8a8-mtp model with statically-quantized shared_head in MTP layers.
cherry-pick from #1584
Signed-off-by: curryliu [email protected]