Skip to content

Commit c364b16

Browse files
committed
update disa-serving tests
Signed-off-by: Xin He (SW-GPU) <[email protected]>
1 parent 43cb50f commit c364b16

File tree

4 files changed

+17
-1
lines changed

4 files changed

+17
-1
lines changed

tests/integration/defs/accuracy/references/cnn_dailymail.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
google/gemma-3-1b-it:
2-
- accuracy: 22.988
2+
- accuracy: 19.0
33
- quant_algo: FP8
44
kv_cache_quant_algo: FP8
55
accuracy: 20.699

tests/integration/defs/accuracy/test_disaggregated_serving.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -344,6 +344,7 @@ class TestLlama3_1_8BInstruct(LlmapiAccuracyTestHarness):
344344
MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
345345
MODEL_PATH = f"{llm_models_root()}/llama-3.1-model/Llama-3.1-8B-Instruct"
346346

347+
@pytest.mark.skip_less_device(2)
347348
@pytest.mark.skip_less_device_memory(32000)
348349
@pytest.mark.parametrize("disable_overlap_scheduler", [False, True])
349350
def test_auto_dtype(self, disable_overlap_scheduler):
@@ -374,6 +375,7 @@ def test_auto_dtype(self, disable_overlap_scheduler):
374375
task = GSM8K(self.MODEL_NAME)
375376
task.evaluate(llm)
376377

378+
@pytest.mark.skip_less_device(2)
377379
def test_ngram(self):
378380
speculative_decoding_config = {
379381
"decoding_type": "NGram",
@@ -421,6 +423,7 @@ def test_ngram(self):
421423
task = GSM8K(self.MODEL_NAME)
422424
task.evaluate(llm)
423425

426+
@pytest.mark.skip_less_device(2)
424427
@parametrize_with_ids("overlap_scheduler", [True, False])
425428
@parametrize_with_ids("eagle3_one_model", [True, False])
426429
def test_eagle3(self, overlap_scheduler, eagle3_one_model):
@@ -479,6 +482,7 @@ def test_eagle3(self, overlap_scheduler, eagle3_one_model):
479482
task = GSM8K(self.MODEL_NAME)
480483
task.evaluate(llm)
481484

485+
@pytest.mark.skip_less_device(2)
482486
@pytest.mark.skip_less_device_memory(32000)
483487
@pytest.mark.parametrize("backend", ["xgrammar", "llguidance"])
484488
def test_guided_decoding(self, backend: str, mocker):

tests/integration/test_lists/qa/llm_function_l20.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ accuracy/test_llm_api.py::TestMistralNemo12B::test_fp8
1919
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_chunked_prefill[attn_backend=FLASHINFER]
2020
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_chunked_prefill[attn_backend=TRTLLM]
2121
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_llm_sampler
22+
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_beam_search[enable_cuda_graph=False-enable_padding=False-disable_overlap_scheduler=False]
2223
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True]
2324
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False]
2425
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_ngram

tests/integration/test_lists/waives.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,7 @@ examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoder] SKIP (http
231231
examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoderplus] SKIP (https://nvbugs/5355128)
232232
stress_test/stress_test.py::test_run_stress_test[llama-v3-8b-instruct-hf_tp1-stress_time_300s_timeout_450s-GUARANTEED_NO_EVICT-pytorch-stress-test] SKIP (https://nvbugs/5375646)
233233
full:L40S/accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugs/5375620)
234+
full:L20/accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugs/5375620)
234235
test_e2e.py::test_ptp_quickstart_advanced_8gpus[Llama3.1-405B-FP8-llama-3.1-model/Llama-3.1-405B-Instruct-FP8] SKIP (https://nvbugs/5380570)
235236
test_e2e.py::test_ptp_quickstart_advanced_8gpus[Nemotron-Ultra-253B-nemotron-nas/Llama-3_1-Nemotron-Ultra-253B-v1] SKIP (https://nvbugs/5380570)
236237
examples/test_multimodal.py::test_llm_multimodal_general[Phi-4-multimodal-instruct-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] SKIP (https://nvbugs/5385992)
@@ -301,6 +302,9 @@ accuracy/test_disaggregated_serving.py::TestQwen3_30B_A3B::test_mixed_ctx_gen_mo
301302
full:L40S/accuracy/test_disaggregated_serving.py::TestDeepSeekV3Lite::test_auto_dtype[mtp_nextn=0-overlap_scheduler=False] SKIP (https://nvbugs/5347051)
302303
full:L40S/accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False] SKIP (https://nvbugs/5471106)
303304
full:L40S/accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_tp_pp_symmetric[MMLU-tp2pp2] SKIP (https://nvbugs/5471108)
305+
full:L20/accuracy/test_disaggregated_serving.py::TestDeepSeekV3Lite::test_auto_dtype[mtp_nextn=0-overlap_scheduler=False] SKIP (https://nvbugs/5347051)
306+
full:L20/accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False] SKIP (https://nvbugs/5471106)
307+
full:L20/accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_tp_pp_symmetric[MMLU-tp2pp2] SKIP (https://nvbugs/5471108)
304308
test_e2e.py::test_multi_nodes_eval[llama4-models/nvidia/Llama-4-Maverick-17B-128E-Instruct-FP8-tp8pp2-mmlu] SKIP (https://nvbugs/5473781)
305309
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backend=CUTLASS-mtp_nextn=0-tp4-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=True] SKIP (https://nvbugs/5476580)
306310
disaggregated/test_disaggregated_single_gpu.py::test_disaggregated_llama_context_capacity[False-False-DeepSeek-V3-Lite-fp8/fp8] SKIP (https://nvbugs/5477404)
@@ -346,6 +350,13 @@ accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8[fp8kv=False-
346350
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8[fp8kv=True-attn_backend=FLASHINFER-torch_compile=True] SKIP (https://nvbugs/5485102)
347351
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_bfloat16_4gpus[tp4-attn_backend=FLASHINFER-torch_compile=True] SKIP (https://nvbugs/5485109)
348352
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_4gpus[tp4-fp8kv=False-attn_backend=FLASHINFER-torch_compile=True] SKIP (https://nvbugs/5485116)
353+
<<<<<<< HEAD
349354
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus_online_eplb[mtp_nextn=2] SKIP (https://nvbugs/5444687)
350355
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus_online_eplb[fp8kv=True] SKIP (https://nvbugs/5444687)
351356
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_4gpus[tp4-fp8kv=True-attn_backend=FLASHINFER-torch_compile=True] SKIP (https://nvbugs/5488580)
357+
=======
358+
accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False] SKIP (https://nvbugs/5488118)
359+
accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True] SKIP (https://nvbugs/5488118)
360+
accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True] SKIP (https://nvbugs/5488141)
361+
accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram SKIP (https://nvbugs/5488118)
362+
>>>>>>> e01468e1a (update disa-serving tests)

0 commit comments

Comments
 (0)