-
-
Notifications
You must be signed in to change notification settings - Fork 10k
[Frontend] Add LLM.reward specific to reward models #21720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: wang.yuqi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to improve the API for reward models by adding a dedicated LLM.reward
method and making the pooling_task
in LLM.encode
explicit. The changes are in the right direction, but the implementation of the new reward
method seems incorrect as it uses pooling_task="encode"
and lacks necessary validation. I've left a critical comment with suggestions to align it with other task-specific methods like embed
and classify
. Additionally, I've suggested improving an error message for better user guidance. Addressing these points will significantly improve the correctness and consistency of the new API.
Quick review |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
I think we should still keep |
Perhaps we need to figure out whether it requires embed or reward. |
This model uses pooler interface for image segmentation. So it's not really embed/reward. The more generic |
I'll wait for buildkite/ci/pr/language-models-test-extended-pooling to finish and check for any other errors |
Signed-off-by: wang.yuqi <[email protected]>
Head branch was pushed to by a user without write access
Let's wait for #18321 to be merged first, then update this PR |
buildkite/ci/pr/language-models-test-extended-pooling The failed test can be fixed by #21747 |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: shuw <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: shuw <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: x22x22 <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: x22x22 <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: Noam Gat <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: Paul Pak <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: wang.yuqi <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
After #21470, LLM.encode pooling_task defaults to using "encode", meaning that reward models actually use LLM.encode. However, most libraries' encode corresponds to an embedding model, which can lead to potential errors.
Test Plan
Test Result
(Optional) Documentation Update