Skip to content

Conversation

@lilinsiman
Copy link
Contributor

What this PR does / why we need it?

add new test model for aclgraph single_request v0.11.0

Does this PR introduce any user-facing change?

no

How was this patch tested?

ut

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new test model, vllm-ascend/DeepSeek-V2-Lite-W8A8, to the end-to-end tests for aclgraph with single requests. The changes are confined to the test file. However, I've found a critical issue in how the server arguments are constructed for this new quantized model. The arguments are malformed, which would cause the test server to fail on startup. I've provided a code suggestion to fix this issue, which also refactors the code to remove duplication and improve maintainability.

Comment on lines 55 to 62
if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8":
server_args = [
"--no-enable-prefix-caching", "--tensor-parallel-size", "1",
"--data-parallel-size",
"--data-parallel-size", "quantization", "ascend",
str(dp_size), "--port",
str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9"
]
else:
server_args = [
"--no-enable-prefix-caching", "--tensor-parallel-size", "1",
"--data-parallel-size",
str(dp_size), "--port",
str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are a couple of issues in this block:

  1. Incorrect Server Arguments: The server arguments for the new model vllm-ascend/DeepSeek-V2-Lite-W8A8 appear to be incorrect. The arguments "quantization" and "ascend" are passed as values to --data-parallel-size, which expects an integer. This will likely cause the server to fail to start. The correct way to enable quantization for this model is probably by using the --quantization ascend flag.
  2. Code Duplication: The if/else block for constructing server_args contains a lot of duplicated code. This can be refactored to improve readability and maintainability.

I've suggested a change that addresses both issues by defining a base list of arguments and conditionally adding the quantization arguments for the specific model.

Suggested change
if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8":
server_args = [
"--no-enable-prefix-caching", "--tensor-parallel-size", "1",
"--data-parallel-size",
"--data-parallel-size", "quantization", "ascend",
str(dp_size), "--port",
str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9"
]
else:
server_args = [
"--no-enable-prefix-caching", "--tensor-parallel-size", "1",
"--data-parallel-size",
str(dp_size), "--port",
str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9"
]
server_args = [
"--no-enable-prefix-caching", "--tensor-parallel-size", "1",
"--data-parallel-size", str(dp_size),
"--port", str(port),
"--trust-remote-code", "--gpu-memory-utilization", "0.9",
]
if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8":
server_args.extend(["--quantization", "ascend"])

@lilinsiman lilinsiman force-pushed the single_v0.11.0 branch 3 times, most recently from 66cca63 to 4ba2acc Compare October 30, 2025 06:15
@yiz-liu yiz-liu merged commit ee2e55e into vllm-project:v0.11.0-dev Oct 31, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants