add new e2e tests case for aclgraph memory to v0.11.0 #3880

lilinsiman · 2025-10-29T13:20:27Z

What this PR does / why we need it?

add new e2e tests case for aclgraph memory to v0.11.0

Does this PR introduce any user-facing change?

no

How was this patch tested?

ut

github-actions · 2025-10-29T13:20:39Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a new end-to-end test to monitor the memory usage of aclgraph capturing. The test is well-structured, patching NPUModelRunner._capture_model to measure memory consumption and asserting it against baseline values for different models. My main feedback is to improve test isolation by using pytest's monkeypatch fixture for environment variable manipulation, which is more robust than manual del and assignment. This change will prevent potential side effects on other tests in the suite.

gemini-code-assist · 2025-10-29T13:22:36Z

tests/e2e/singlecard/test_aclgraph_mem.py

+def test_aclgraph_mem_use(model: str, max_tokens: int) -> None:
+    del os.environ["VLLM_WORKER_MULTIPROC_METHOD"]
+    capture_called = multiprocessing.Value("i", 0)  # int, 0 or 1
+    capture_mem_before = multiprocessing.Value("q", -1)  # long long (64-bit)
+    capture_mem_after = multiprocessing.Value("q", -1)  # long long
+
+    def capture_model_wrapper(original_method):
+
+        def wrapped(self):
+            mem_before = torch.npu.mem_get_info()[0]  # free memory
+            result = original_method(self)
+            mem_after = torch.npu.mem_get_info()[0]
+            with capture_called.get_lock():
+                capture_called.value = 1
+                capture_mem_before.value = mem_before
+                capture_mem_after.value = mem_after
+            return result
+
+        return wrapped
+
+    original_capture = NPUModelRunner._capture_model
+
+    with patch.object(NPUModelRunner,
+                      '_capture_model',
+                      new=capture_model_wrapper(original_capture)):
+        prompts = [
+            "Hello, my name is", "The president of the United States is",
+            "The capital of France is", "The future of AI is"
+        ]
+        sampling_params = SamplingParams(max_tokens=max_tokens,
+                                         temperature=0.0)
+        if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8":
+            vllm_model = LLM(snapshot_download(model),
+                             max_model_len=1024,
+                             quantization="ascend")
+        else:
+            vllm_model = LLM(snapshot_download(model))
+        _ = vllm_model.generate(prompts, sampling_params)
+
+    assert capture_called.value == 1, "_capture_model was not called during test"
+    assert capture_mem_before.value != -1, "capture_mem_before not set"
+    assert capture_mem_after.value != -1, "capture_mem_after not set"
+
+    print("capture_mem_before =", capture_mem_before.value)
+    print("capture_mem_after =", capture_mem_after.value)
+
+    mem_used_by_capture = capture_mem_before.value - capture_mem_after.value
+    # Empirical observation: capturing ACL graphs for Qwen3-0.6B uses ~0.20 GiB of NPU memory.
+    # DeepSeek-V2-Lite-W8A8 uses ~0.64 GiB of NPU memory
+    # a 1.3x tolerance is applied to account for runtime variance.
+    if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8":
+        baseline_capture_mem = 0.64
+        capture_mem_tolerance = 1.3
+    else:
+        baseline_capture_mem = 0.20
+        capture_mem_tolerance = 1.3
+    max_capture_mem_gib = baseline_capture_mem * capture_mem_tolerance
+    max_mem_expected = max_capture_mem_gib * (1024**3)
+    assert mem_used_by_capture < max_mem_expected, (
+        f"_capture_model used more memory than expected. "
+        f"Used: {mem_used_by_capture / (1024**3):.2f} GiB, "
+        f"Expected: < {max_capture_mem_gib:.2f} GiB")
+    os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = 'spawn'


The test modifies os.environ by deleting VLLM_WORKER_MULTIPROC_METHOD and then restoring it at the end. This approach is not robust. If an assertion fails before the end of the test, the environment variable will not be restored, which can lead to test isolation issues and cause other tests to fail unpredictably.

A better approach is to use pytest's monkeypatch fixture, which automatically handles setup and teardown of environment modifications, ensuring the environment is always restored to its original state.

def test_aclgraph_mem_use(model: str, max_tokens: int, monkeypatch) -> None: monkeypatch.delenv("VLLM_WORKER_MULTIPROC_METHOD", raising=False) capture_called = multiprocessing.Value("i", 0) # int, 0 or 1 capture_mem_before = multiprocessing.Value("q", -1) # long long (64-bit) capture_mem_after = multiprocessing.Value("q", -1) # long long def capture_model_wrapper(original_method): def wrapped(self): mem_before = torch.npu.mem_get_info()[0] # free memory result = original_method(self) mem_after = torch.npu.mem_get_info()[0] with capture_called.get_lock(): capture_called.value = 1 capture_mem_before.value = mem_before capture_mem_after.value = mem_after return result return wrapped original_capture = NPUModelRunner._capture_model with patch.object(NPUModelRunner, '_capture_model', new=capture_model_wrapper(original_capture)): prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is" ] sampling_params = SamplingParams(max_tokens=max_tokens, temperature=0.0) if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8": vllm_model = LLM(snapshot_download(model), max_model_len=1024, quantization="ascend") else: vllm_model = LLM(snapshot_download(model)) _ = vllm_model.generate(prompts, sampling_params) assert capture_called.value == 1, "_capture_model was not called during test" assert capture_mem_before.value != -1, "capture_mem_before not set" assert capture_mem_after.value != -1, "capture_mem_after not set" print("capture_mem_before =", capture_mem_before.value) print("capture_mem_after =", capture_mem_after.value) mem_used_by_capture = capture_mem_before.value - capture_mem_after.value # Empirical observation: capturing ACL graphs for Qwen3-0.6B uses ~0.20 GiB of NPU memory. # DeepSeek-V2-Lite-W8A8 uses ~0.64 GiB of NPU memory # a 1.3x tolerance is applied to account for runtime variance. if model == "vllm-ascend/DeepSeek-V2-Lite-W8A8": baseline_capture_mem = 0.64 capture_mem_tolerance = 1.3 else: baseline_capture_mem = 0.20 capture_mem_tolerance = 1.3 max_capture_mem_gib = baseline_capture_mem * capture_mem_tolerance max_mem_expected = max_capture_mem_gib * (1024**3) assert mem_used_by_capture < max_mem_expected, ( f"_capture_model used more memory than expected. " f"Used: {mem_used_by_capture / (1024**3):.2f} GiB, " f"Expected: < {max_capture_mem_gib:.2f} GiB")

Signed-off-by: lilinsiman <[email protected]>

github-actions bot added the module:tests label Oct 29, 2025

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

lilinsiman force-pushed the mem_v0.11.0 branch 2 times, most recently from 8c5a080 to 1e4b511 Compare October 30, 2025 10:16

add new e2e tests case for aclgraph memory to v0.11.0

407d165

Signed-off-by: lilinsiman <[email protected]>

lilinsiman force-pushed the mem_v0.11.0 branch from 1e4b511 to 407d165 Compare October 30, 2025 11:07

yiz-liu merged commit 387ce1c into vllm-project:v0.11.0-dev Oct 31, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add new e2e tests case for aclgraph memory to v0.11.0 #3880

add new e2e tests case for aclgraph memory to v0.11.0 #3880

lilinsiman commented Oct 29, 2025

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add new e2e tests case for aclgraph memory to v0.11.0 #3880

add new e2e tests case for aclgraph memory to v0.11.0 #3880

Conversation

lilinsiman commented Oct 29, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants