-
Couldn't load subscription status.
- Fork 519
[TEST]Add 2P1D multi node cases for nightly test #3764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jiangyunfan1 <[email protected]>
Signed-off-by: jiangyunfan1 <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables and updates a multi-node end-to-end test for the DeepSeek-R1-W8A8 model. The changes include updating benchmark configurations in the YAML file, implementing the test logic in test_multi_node.py to run performance and accuracy tests using aisbench, and a small fix in aisbench.py to handle single test cases. My review focuses on the implementation in test_multi_node.py. I've identified a potential issue with calling blocking code from an asynchronous function, which could impact performance and is against asyncio best practices. I've provided a suggestion to fix this.
| run_aisbench_cases(config.model, port, acc_cmd) | ||
| run_aisbench_cases(config.model, port, perf_cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The run_aisbench_cases function is synchronous and performs blocking I/O operations (e.g., subprocess.Popen and reading from stdout). Calling it directly from an async function like test_multi_node will block the asyncio event loop, which can lead to performance issues and defeats the purpose of using asyncio.
You should run blocking functions in a separate thread to avoid blocking the event loop. You can use asyncio.to_thread for this.
Note: You'll need to add import asyncio at the top of the file.
| run_aisbench_cases(config.model, port, acc_cmd) | |
| run_aisbench_cases(config.model, port, perf_cmd) | |
| await asyncio.to_thread(run_aisbench_cases, config.model, port, acc_cmd) | |
| await asyncio.to_thread(run_aisbench_cases, config.model, port, perf_cmd) |
Signed-off-by: jiangyunfan1 <[email protected]>
Signed-off-by: jiangyunfan1 <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
What this PR does / why we need it?
This PR adds the 2P1D multi node func/acc/perf test cases, we need test them daily
Does this PR introduce any user-facing change?
No
How was this patch tested?
by running the test