[CI] Make DBO tests non-optional#34450
[CI] Make DBO tests non-optional#34450LucasWilkinson wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
There was a problem hiding this comment.
Code Review
This pull request makes the DBO (Dual Batch Overlap) tests non-optional to improve bug detection for MoE (Mixture of Experts) models. This is achieved by splitting a larger, optional test step into two more specific steps: one for MoE/DBO tests which is now mandatory, and another for context parallel tests which remains optional. This is a sensible change that aligns with the goal of catching regressions earlier. I have one suggestion to improve the test coverage of the newly promoted DBO test.
| num_devices: 2 | ||
| commands: | ||
| - VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model=Qwen/Qwen1.5-MoE-A2.7B -tp=1 -dp=2 --max-model-len=2048 --all2all-backend=deepep_high_throughput | ||
| - pytest -v -s tests/v1/distributed/test_dbo.py |
There was a problem hiding this comment.
The test_dbo.py test is being promoted to a non-optional CI step because it's effective at catching MoE-related bugs. However, this pytest command is missing the VLLM_USE_DEEP_GEMM=1 environment variable, which is set for the other MoE test command in this same step. This means the DBO test won't cover MoE execution with DeepGEMM kernels, which is a significant gap. To ensure consistent and thorough testing, and to aid in debugging, I recommend adding both VLLM_USE_DEEP_GEMM=1 and VLLM_LOGGING_LEVEL=DEBUG.
- VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG pytest -v -s tests/v1/distributed/test_dbo.py
The DBO tests have been catching alot of non-DBO related MoE bugs lately on the nightly/dailies, make non-optional for now to catch them sooner until we have better distributed MoE test coverage