- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 11k
 
[Bugfix] Validate custom logits processor xargs for online serving #27560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in the DeepSeek-OCR chat template fallback and allows a list of integers as vllm_xargs for ChatCompletionRequest to accommodate custom logits processors. The code changes modify vllm/entrypoints/openai/protocol.py to allow lists in vllm_xargs and vllm/transformers_utils/configs/deepseek_vl2.py to correctly set the model_type for DeepSeek OCR models. No style guide was provided, and the changes appear consistent with common Python practices.
| # update model_type for OCR model | ||
| if "DeepseekOCRForCausalLM" in ( | ||
| self.architectures or kwargs.get("architectures", []) | ||
| ): | ||
| self.model_type = "deepseek_ocr" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition if "DeepseekOCRForCausalLM" in (self.architectures or kwargs.get("architectures", [])) could potentially be simplified by directly checking if "DeepseekOCRForCausalLM" in self.architectures + kwargs.get("architectures", []). This avoids the need for the or operator and might improve readability.
However, it's crucial to ensure that this change doesn't alter the behavior of the code, especially if self.architectures or kwargs.get("architectures", []) could be None or not a list. Adding a check to ensure that these are lists before concatenation could mitigate this risk.
| # update model_type for OCR model | |
| if "DeepseekOCRForCausalLM" in ( | |
| self.architectures or kwargs.get("architectures", []) | |
| ): | |
| self.model_type = "deepseek_ocr" | |
| # update model_type for OCR model | |
| architectures = self.architectures if self.architectures else [] | |
| architectures += kwargs.get("architectures", []) if kwargs.get("architectures", []) else [] | |
| if "DeepseekOCRForCausalLM" in architectures: | |
| self.model_type = "deepseek_ocr" | 
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to patch a security vulnerability before merging this
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
| 
           Documentation preview: https://vllm--27560.org.readthedocs.build/en/27560/  | 
    
| 
           cc @afeldman-nm I think we discussed this at one point  | 
    
| 
           Thanks @Isotr0py! I would also like to review this  | 
    
| 
           This pull request has merge conflicts that must be resolved before it can be  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @Isotr0py ! It will be good to have argument validation
        
          
                docs/features/custom_logitsprocs.md
              
                Outdated
          
        
      | While request-level logits processors are explicitly *not* supported in the vLLM engine, vLLM *does* provide a convenient process to wrap an existing `Callable` request-level logits processor and create a batch-level logits processor that is compatible with vLLM. The `Callable` must conform to the type annotation above; if your request-level logits processor has a different interface, then in order to wrap it, you may need to modify it or implement an additional wrapper layer to comply with the interface specification above. | ||
| 
               | 
          ||
| You can wrap the request-level logits processor by subclassing `AdapterLogitsProcessor` as shown in the example below (in this example, `DummyPerReqLogitsProcessor` is a stand-in for your request-level logits processor which needs to be wrapped.) Override `AdapterLogitsProcessor.is_argmax_invariant(self)` to accurately reflect whether your request-level logits processor may impact which token has the highest-value logit. Override `AdapterLogitsProcessor.new_req_logits_processor(self,params)` to create a new request-level logits processor instance from a `SamplingParams` instance: | ||
| You can wrap the request-level logits processor by subclassing `AdapterLogitsProcessor` as shown in the example below (in this example, `DummyPerReqLogitsProcessor` is a stand-in for your request-level logits processor which needs to be wrapped.): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is great that you updated the documentation to reflect these changes.
Correct me if wrong, but it appears like this PR right now is only updating the documentation for the special case of adapting a request-level logits processor.
I think we will probably also want to update the documentation for
- 
The "Programming Model" section of the logits processors design docs https://docs.vllm.ai/en/latest/design/logits_processors.html#logits-processor-programming-model
 - 
Other sections of the custom logits processor design docs, specifically "Creating a custom logits processor" (https://docs.vllm.ai/en/latest/features/custom_logitsprocs.html#creating-a-custom-logits-processor), "Passing Custom Argument to a Custom Logits Procesor" (https://docs.vllm.ai/en/latest/features/custom_logitsprocs.html#passing-custom-argument-to-a-custom-logits-processor), "Example custom logits processor implementation" (https://docs.vllm.ai/en/latest/features/custom_logitsprocs.html#example-custom-logits-processor-implementation)
 
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
| 
           Sorry for the delayed update! I was a bit occupied last week. 😅 This PR should be ready for a further review now!  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
 - Mark a draft as ready
 - Comment "@codex review".
 
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
| 
           /gemini review Just in case  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a robust validation mechanism for custom logits processor arguments, especially for online serving, by adding a validate_params method to the LogitsProcessor interface. This is a great improvement to prevent invalid parameters from causing issues downstream. The changes to allow list values in vllm_xargs and the fix for the deepseek-ocr model type are also valuable. The documentation and tests have been updated accordingly. I have one suggestion to refactor some duplicated code for better maintainability.
Signed-off-by: Isotr0py <[email protected]>
Purpose
deepseek_vl_v2, so the default deepseek-ocr chat template won't be used by default:vllm/vllm/transformers_utils/chat_templates/registry.py
Line 36 in a806c14
vllm_xargsforChatCompletionRequest, otherwise"whitelist_token_ids": [128821, 128822],parameters for deepseek-ocr's custom logits processor is not allowed.vllm_xargsto avoid unexpected behavior from custom logits processor.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.