-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Mirroring changes in test-pipeline.yaml into test-amd.yaml #27242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mirroring changes in test-pipeline.yaml into test-amd.yaml #27242
Conversation
Signed-off-by: Alexei V. Ivanov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request mirrors changes from test-pipeline.yaml
into test-amd.yaml
. While many changes are valid, several NVIDIA-specific test configurations (for Blackwell and H200 GPUs) have been incorrectly copied into the AMD pipeline file. These steps will fail on AMD hardware and should be removed. There is also a CUDA-specific test dependency that should be removed.
- label: Blackwell Test # 21 min | ||
timeout_in_minutes: 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- label: Blackwell Fusion Tests # 30 min | ||
timeout_in_minutes: 40 | ||
working_dir: "/vllm-workspace/" | ||
gpu: b200 | ||
source_file_dependencies: | ||
- csrc/quantization/fp4/ | ||
- vllm/model_executor/layers/quantization/utils/flashinfer_utils.py | ||
- vllm/v1/attention/backends/flashinfer.py | ||
- vllm/compilation/ | ||
# can affect pattern matching | ||
- vllm/model_executor/layers/layernorm.py | ||
- vllm/model_executor/layers/activation.py | ||
- vllm/model_executor/layers/quantization/input_quant_fp8.py | ||
commands: | ||
- nvidia-smi | ||
- pytest -v -s tests/compile/test_fusion_attn.py | ||
- pytest -v -s tests/compile/test_silu_mul_quant_fusion.py | ||
# this runner has 2 GPUs available even though num_gpus=2 is not set | ||
- pytest -v -s tests/compile/test_fusion_all_reduce.py | ||
- pytest -v -s tests/compile/test_fusions_e2e.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- label: Distributed Tests (H200) # optional | ||
gpu: h200 | ||
optional: true | ||
working_dir: "/vllm-workspace/" | ||
num_gpus: 2 | ||
commands: | ||
- pytest -v -s tests/compile/test_async_tp.py | ||
- pytest -v -s tests/compile/test_sequence_parallelism.py | ||
- pytest -v -s tests/compile/test_fusion_all_reduce.py | ||
- pytest -v -s tests/compile/test_fusions_e2e.py::test_tp2_attn_quant_allreduce_rmsnorm | ||
- pytest -v -s tests/distributed/test_context_parallel.py | ||
- CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source_file_dependencies: | ||
- csrc/ | ||
- tests/kernels/core | ||
- tests/kernels/test_top_k_per_row.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test file tests/kernels/test_top_k_per_row.py
is CUDA-specific, as indicated by @pytest.mark.skipif(not current_platform.is_cuda(), reason="This test requires CUDA")
within the file. Including this as a source dependency in test-amd.yaml
is incorrect. This dependency should be removed from the AMD pipeline.
- cd .. && VLLM_WORKER_MULTIPROC_METHOD=spawn pytest -v -s tests/models/multimodal/generation/test_whisper.py -m core_model # Otherwise, mp_method="spawn" doesn't work | ||
|
||
- label: Multi-Modal Accuracy Eval (Small Models) # 50min | ||
mirror_hardwares: [amdexperimental] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we use amd production to enable this in actual CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pipeline in its current state is meant to be for monitoring only and not gating for the PRs
|
||
- label: Blackwell Test # 38 min | ||
timeout_in_minutes: 60 | ||
- label: Blackwell Test # 21 min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to test Blackwell on this AMD specific pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We include that test group for completeness reasons. For the moment it is just a comment and not an instruction for action in test-amd.yaml. But things may change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming gpu: h200 are skipped in this pipeline and not put load on the actual h200 nodes
Mirroring changes in test-pipeline.yaml into test-amd.yaml
Signed-off-by: Alexei V. Ivanov [email protected]