-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
[Bugfix][PD] set max_completion_tokens=1 if req has this value #21841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in the disaggregated serving examples where prefill requests were not correctly limited to generating a single token if the max_completion_tokens
parameter was present. The fix correctly sets max_completion_tokens
to 1 in addition to max_tokens
.
The change is correctly applied for chat completion endpoints. However, in disagg_proxy_demo.py
, the fix is missing for the standard completion endpoint (create_completion
method), which could lead to the same bug persisting for those requests. I've added a comment to highlight this for a more complete fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!!
curl http://localhost:10001/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello! What can you do?"}
],
"temperature": 0.7,
"max_completion_tokens": 20
}'
Indeed, using "max_completion_tokens": 20"
can reliably reproduce the issue.
Does tests/v1/kv_connector/nixl_integration/toy_proxy_server.py
have the same issue?
I just checked these tests, and it seems that there are no tests added for /v1/chat/completions at all. I will add this change in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks~
Signed-off-by: Abirdcfly <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: shuw <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: shuw <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: x22x22 <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: x22x22 <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: jingyu <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: Noam Gat <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]>
…project#21841) Signed-off-by: Abirdcfly <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Test Plan
Test Result
(Optional) Documentation Update
Fix #21712
vllm/vllm/entrypoints/utils.py
Lines 289 to 294 in 04e3850
func
get_max_tokens
used in chat or completions, It first looks for the value ofmax_completion_tokens
, followed by the value ofmax_tokens
, which means that ifmax_completion_tokens
has a value, no matter how muchmax_tokens
is set, it will not take effect.when use
python3 benchmark_serving.py --backend openai-chat --endpoint /v1/chat/completions
the req hasmax_completion_tokens
. Therefore, the proxy.py settingmax_tokens=1
becomes useless, and only the value ofmax_completion_tokens
is used in vllm serve. :vllm/benchmarks/backend_request_func.py
Line 376 in ad341c5