-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
[Speculative Decoding] Add speculators
config support
#21345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Dipika Sikka <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for Speculators Config. There are a few critical issues that need to be addressed:
- There's a consistent misuse of
self.config_dict
instead ofself.config
invllm/transformers_utils/configs/speculators/base.py
, which will cause runtime errors. - A typo in a method name
update_defualts
invllm/transformers_utils/configs/speculators/eagle.py
will prevent it from being called, breaking the configuration logic for Eagle-1 models.
speculators
Config Support
👀 |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
speculators
Config Supportspeculators
config support
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
Signed-off-by: Dipika Sikka <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round
Signed-off-by: Dipika Sikka <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny comment, otherwise LGTM
Signed-off-by: Dipika Sikka <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
…#21345) Signed-off-by: shuw <[email protected]>
…#21345) Signed-off-by: shuw <[email protected]>
…#21345) Signed-off-by: x22x22 <[email protected]>
…#21345) Signed-off-by: x22x22 <[email protected]>
…#21345) Signed-off-by: jingyu <[email protected]>
…#21345) Signed-off-by: Jinzhen Lin <[email protected]>
…#21345) Signed-off-by: Noam Gat <[email protected]>
…#21345) Signed-off-by: Paul Pak <[email protected]>
…#21345) Signed-off-by: Boyuan Feng <[email protected]>
…#21345) Signed-off-by: Diego-Castan <[email protected]>
Purpose
Summary of Changes
SpeculatorsConfig
to load models saved with the speculators formatnorm_before_residual
- a field saved by eagle3 speculatorsspeculative_config
as input. In order to achieve this functionality, we optionally update the model based on if the runner is a draft or not, and fill the details of the speculative_config based on the configSpeculators Config:
API
vllm serve
speculative_config
pathway - this allows you to override any arguments in your config (such as num_speculative_tokens)E.g
VLLM_USE_V1=1 vllm serve "nm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717"
VLLM_USE_V1=1 vllm serve RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic --speculative_config '{"model":"nm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717", "num_speculative_tokens": 5, "method": "eagle3"}'
Test Plan
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
meta-llama/Meta-Llama-3.1-8B-Instruct
Test Result - GuideLLM Benchmarking:
vLLM:
VLLM_USE_V1=1 vllm serve nm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717 --port 7600 >output_speculators_llama.tx
Per-position acceptance rate (dense target)
FP8 Quantized Target:
Follow-up