add new test model for aclgraph single_request v0.11.0 #3889

lilinsiman · 2025-10-30T02:30:42Z

What this PR does / why we need it?

add new test model for aclgraph single_request v0.11.0

Does this PR introduce any user-facing change?

no

How was this patch tested?

ut

github-actions · 2025-10-30T02:30:52Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds a new test model, vllm-ascend/DeepSeek-V2-Lite-W8A8, to the end-to-end tests for aclgraph with single requests. The changes are confined to the test file. However, I've found a critical issue in how the server arguments are constructed for this new quantized model. The arguments are malformed, which would cause the test server to fail on startup. I've provided a code suggestion to fix this issue, which also refactors the code to remove duplication and improve maintainability.

gemini-code-assist · 2025-10-30T02:31:44Z

tests/e2e/multicard/test_single_request_aclgraph.py

+    if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8":
+        server_args = [
        "--no-enable-prefix-caching", "--tensor-parallel-size", "1",
-        "--data-parallel-size",
+        "--data-parallel-size", "quantization", "ascend",
        str(dp_size), "--port",
        str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9"
    ]
+    else:
+        server_args = [
+            "--no-enable-prefix-caching", "--tensor-parallel-size", "1",
+            "--data-parallel-size",
+            str(dp_size), "--port",
+            str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9"
+        ]


There are a couple of issues in this block:

Incorrect Server Arguments: The server arguments for the new model vllm-ascend/DeepSeek-V2-Lite-W8A8 appear to be incorrect. The arguments "quantization" and "ascend" are passed as values to --data-parallel-size, which expects an integer. This will likely cause the server to fail to start. The correct way to enable quantization for this model is probably by using the --quantization ascend flag.

Code Duplication: The if/else block for constructing server_args contains a lot of duplicated code. This can be refactored to improve readability and maintainability.

I've suggested a change that addresses both issues by defining a base list of arguments and conditionally adding the quantization arguments for the specific model.

Suggested change

if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8":

server_args = [

"--no-enable-prefix-caching", "--tensor-parallel-size", "1",

"--data-parallel-size",

"--data-parallel-size", "quantization", "ascend",

str(dp_size), "--port",

str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9"

]

else:

server_args = [

"--no-enable-prefix-caching", "--tensor-parallel-size", "1",

"--data-parallel-size",

str(dp_size), "--port",

str(port), "--trust-remote-code", "--gpu-memory-utilization", "0.9"

]

server_args = [

"--no-enable-prefix-caching", "--tensor-parallel-size", "1",

"--data-parallel-size", str(dp_size),

"--port", str(port),

"--trust-remote-code", "--gpu-memory-utilization", "0.9",

]

if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8":

server_args.extend(["--quantization", "ascend"])

Signed-off-by: lilinsiman <[email protected]>

github-actions bot added the module:tests label Oct 30, 2025

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

lilinsiman force-pushed the single_v0.11.0 branch 3 times, most recently from 66cca63 to 4ba2acc Compare October 30, 2025 06:15

add new test model for aclgraph single_request v0.11.0

55cc947

Signed-off-by: lilinsiman <[email protected]>

lilinsiman force-pushed the single_v0.11.0 branch from 4ba2acc to 55cc947 Compare October 31, 2025 02:29

yiz-liu merged commit ee2e55e into vllm-project:v0.11.0-dev Oct 31, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add new test model for aclgraph single_request v0.11.0 #3889

add new test model for aclgraph single_request v0.11.0 #3889

lilinsiman commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add new test model for aclgraph single_request v0.11.0 #3889

add new test model for aclgraph single_request v0.11.0 #3889

Conversation

lilinsiman commented Oct 30, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants