-
Notifications
You must be signed in to change notification settings - Fork 533
add new test model for aclgraph single_request #3888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new test model, vllm-ascend/DeepSeek-V2-Lite-W8A8, to the single_request_aclgraph end-to-end test. My review identifies a critical issue with how server arguments are constructed for this new model, which would cause the test to fail. I've provided a code suggestion to fix the bug and refactor the code for better readability and maintainability.
| if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8": | ||
| server_args = [ | ||
| "--no-enable-prefix-caching", "--tensor-parallel-size", "1", | ||
| "--data-parallel-size", | ||
| "--data-parallel-size", "quantization", "ascend", | ||
| str(dp_size), "--port", | ||
| str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9" | ||
| ] | ||
| else: | ||
| server_args = [ | ||
| "--no-enable-prefix-caching", "--tensor-parallel-size", "1", | ||
| "--data-parallel-size", | ||
| str(dp_size), "--port", | ||
| str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server arguments for the new model vllm-ascend/DeepSeek-V2-Lite-W8A8 are incorrect. The arguments "quantization" and "ascend" are misplaced and will cause the argument parser to fail. They should be passed as "--quantization", "ascend".
Additionally, there is significant code duplication between the if and else blocks. This can be refactored to improve readability and maintainability by defining the common arguments first and then conditionally adding the model-specific ones.
| if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8": | |
| server_args = [ | |
| "--no-enable-prefix-caching", "--tensor-parallel-size", "1", | |
| "--data-parallel-size", | |
| "--data-parallel-size", "quantization", "ascend", | |
| str(dp_size), "--port", | |
| str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9" | |
| ] | |
| else: | |
| server_args = [ | |
| "--no-enable-prefix-caching", "--tensor-parallel-size", "1", | |
| "--data-parallel-size", | |
| str(dp_size), "--port", | |
| str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9" | |
| ] | |
| server_args = [ | |
| "--no-enable-prefix-caching", "--tensor-parallel-size", "1", | |
| "--data-parallel-size", str(dp_size), | |
| "--port", str(port), | |
| "--trust-remote-code", "--gpu-memory-utilization", "0.9" | |
| ] | |
| if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8": | |
| server_args.extend(["--quantization", "ascend"]) |
bfaf977 to
58c6ea0
Compare
9e814d7 to
27aef0b
Compare
Signed-off-by: lilinsiman <[email protected]>
27aef0b to
e1d34cb
Compare
What this PR does / why we need it?
add new test model for aclgraph single_request
Does this PR introduce any user-facing change?
no
How was this patch tested?
ut