[TRTLLM-7250][fix] Add failed cases into waives.txt #7342

xinhe-nv · 2025-08-28T09:56:10Z

waive failed cases.

Summary by CodeRabbit

Tests
- Updated accuracy reference for a model on the CNN/DailyMail dataset.
- Added conditional skips requiring at least 2 devices for several disaggregated serving tests (auto-dtype, n-gram, Eagle3, guided decoding).
Chores
- Data fixture updates and test waiver entries; no user-facing or API changes.

coderabbitai · 2025-08-28T09:56:17Z

📝 Walkthrough

Walkthrough

Updated a CNN/DailyMail accuracy reference for google/gemma-3-1b-it, added device-count skip decorators to four tests in TestLlama3_1_8BInstruct, and added three SKIP entries to the integration waives list. No production code or public API changes.

Changes

Cohort / File(s)	Summary
Accuracy reference update `tests/integration/defs/accuracy/references/cnn_dailymail.yaml`	Adjusted recorded accuracy for `google/gemma-3-1b-it` from `22.988` to `19.0`; other fields unchanged.
Test skip gating (>=2 devices) `tests/integration/defs/accuracy/test_disaggregated_serving.py`	Added `@pytest.mark.skip_less_device(2)` to `test_auto_dtype`, `test_ngram`, `test_eagle3`, and `test_guided_decoding` in `TestLlama3_1_8BInstruct`.
Integration waives list `tests/integration/test_lists/waives.txt`	Added three `SKIP` entries for failing accuracy/disaggregated-serving tests (entries for L20 paths; related to `test_auto_dtype`, `test_eagle3`, and `test_tp_pp_symmetric`).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Pytest as Test Runner
    participant TestCase as TestLlama3_1_8BInstruct
    Note over Pytest,TestCase #DDEBF7: New gating: skip if devices < 2
    Pytest->>TestCase: collect tests
    TestCase->>Pytest: test metadata (has skip_less_device(2))
    alt devices >= 2
        Pytest->>TestCase: execute test
        TestCase-->>Pytest: test result (pass/fail)
    else devices < 2
        Pytest-->>TestCase: skip test
        TestCase-->>Pytest: skipped
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

[None][fix] update skip config #6891 — Adds the same @pytest.mark.skip_less_device(2) decorators to the identical four tests; overlapping changes.
[TRTLLM-6856][feat] add disaggregated serving tests to QA list #6536 — Modifies skip/pytest decorators within TestLlama3_1_8BInstruct (e.g., test_auto_dtype); related test gating edits.
[None][chore] Add failed cases into waives.txt #6914 — Updates the integration waives list; overlaps with added SKIP entries in tests/integration/test_lists/waives.txt.

Suggested reviewers

crazydemo
LarryXFly

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4c27d65 and d07e1a7.

📒 Files selected for processing (3)

tests/integration/defs/accuracy/references/cnn_dailymail.yaml (1 hunks)
tests/integration/defs/accuracy/test_disaggregated_serving.py (4 hunks)
tests/integration/test_lists/waives.txt (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/integration/defs/accuracy/references/cnn_dailymail.yaml
tests/integration/defs/accuracy/test_disaggregated_serving.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-26T09:49:04.956Z

Learnt from: pengbowang-nv
PR: NVIDIA/TensorRT-LLM#7192
File: tests/integration/test_lists/test-db/l0_dgx_b200.yml:56-72
Timestamp: 2025-08-26T09:49:04.956Z
Learning: In TensorRT-LLM test configuration files, the test scheduling system handles wildcard matching with special rules that prevent duplicate test execution even when the same tests appear in multiple yaml files with overlapping GPU wildcards (e.g., "*b200*" and "*gb200*").

Applied to files:

tests/integration/test_lists/waives.txt

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

tests/integration/test_lists/waives.txt (1)

306-308: L20 waivers correctly mirror their L40S counterparts.

Verified that TestDeepSeekV3Lite::test_auto_dtype, TestLlama3_1_8BInstruct::test_eagle3, and TestLlama3_1_8BInstruct::test_tp_pp_symmetric exist in tests/integration/defs/accuracy/test_disaggregated_serving.py with the specified parameter combinations; each L20 entry appears exactly once and matches its L40S waiver.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbit or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/integration/test_lists/waives.txt (2)

342-347: Normalize nvbugs links to the canonical host

Use the same canonical URL format used elsewhere to avoid broken links and ease grep. Apply the following:

-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False] SKIP (https://nvbugs/5486081)
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True] SKIP (https://nvbugs/5486081)
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False] SKIP (https://nvbugs/5471106)
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True] SKIP (https://nvbugs/5486081)
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram SKIP (https://nvbugs/5486081)
-accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugs/5451662)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False] SKIP (https://nvbugspro.nvidia.com/bug/5486081)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True] SKIP (https://nvbugspro.nvidia.com/bug/5486081)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False] SKIP (https://nvbugspro.nvidia.com/bug/5471106)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True] SKIP (https://nvbugspro.nvidia.com/bug/5486081)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram SKIP (https://nvbugspro.nvidia.com/bug/5486081)
+accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugspro.nvidia.com/bug/5451662)

342-347: Add TTL/owner to waivers

Consider appending a short owner tag or expiration date to prevent waivers from becoming permanent.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ae89163 and 976f4f5.

📒 Files selected for processing (1)

tests/integration/test_lists/waives.txt (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (4)

tests/integration/test_lists/waives.txt (4)

342-343: Verify both auto_dtype=True and False skips map to the same NVBUG
Currently both test_auto_dtype[False] and [True] are skipped under nvbugs/5486081; confirm they truly share the same root cause or split into separate bug references/narrower selectors.

347-347: Duplicate skip entry: narrow generic waiver
tests/integration/test_lists/waives.txt has both an L40S-scoped skip (line 233) and a generic skip (line 347) for TestGemma3_1BInstruct::test_auto_dtype. If this failure is L40S-only, remove or scope the generic skip; otherwise, keep as-is.

346-346: Waiver selector string is valid
Confirmed TestLlama3_1_8BInstruct defines def test_ngram in the source, so the waiver entry correctly targets an existing test.

344-345: Check waiver coverage for eagle3 parameter permutations
Only two of the four eagle3_one_model/overlap_scheduler combinations are waived. Confirm which permutations are actually impacted by the bug: add waivers for any additional failing combos or restrict the file to exactly the two known failures.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
1-1: Mandatory NVIDIA copyright header missing

Per repo guidelines, prepend the NVIDIA copyright header (current year) to Python sources.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
55-55: Python 3.8 incompatibility: PEP 585 generics used at runtime

list[concurrent.futures.Future[RequestOutput]] requires Python 3.9+. The project targets 3.8+, so this will raise at import time on 3.8. Use typing.List instead.
-        self.futures: list[concurrent.futures.Future[RequestOutput]] = []
+        self.futures: List[concurrent.futures.Future[RequestOutput]] = []

🧹 Nitpick comments (2)

tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
347-347: Skip on < 2 devices is correct; consider deduping at class level

These tests launch separate context and generation servers, each bound to a different GPU via CUDA_VISIBLE_DEVICES, so ≥2 devices are required—good to guard with skip_less_device(2). To DRY this, you can apply the mark once at the class level and drop the per-test marks.
 @pytest.mark.timeout(3600)
-class TestLlama3_1_8BInstruct(LlmapiAccuracyTestHarness):
+@pytest.mark.skip_less_device(2)
+class TestLlama3_1_8BInstruct(LlmapiAccuracyTestHarness):
@@
-    @pytest.mark.skip_less_device(2)
     @pytest.mark.skip_less_device_memory(32000)
@@
-    @pytest.mark.skip_less_device(2)
@@
-    @pytest.mark.skip_less_device(2)
@@
-    @pytest.mark.skip_less_device(2)
     @pytest.mark.skip_less_device_memory(32000)
522-574: One more test likely needs the same device gating

test_guided_decoding_with_eagle3 also launches one context and one generation server (each consuming one GPU). Consider adding @pytest.mark.skip_less_device(2) here for consistency and to avoid single-GPU CI failures.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 976f4f5 and 4c27d65.

📒 Files selected for processing (2)

tests/integration/defs/accuracy/references/cnn_dailymail.yaml (1 hunks)
tests/integration/defs/accuracy/test_disaggregated_serving.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs
Preserve module namespaces in imports: import the subpackage/module, not the symbol (from package.subpackage import foo; foo.SomeClass())
Naming: files snake_case; classes PascalCase; functions/methods snake_case; local variables snake_case (k_ prefix if starting with a number); globals G_ + UPPER_SNAKE_CASE; constants UPPER_SNAKE_CASE
Avoid shadowing outer-scope variables; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; reserve comments for function-internal or file-local interfaces
Use Google-style docstrings for classes and functions; inline docstrings for attributes/variables are allowed
Avoid reflection when straightforward code suffices (e.g., prefer explicit parameters over dict(**locals()))
Use narrow except clauses (e.g., catch FileNotFoundError instead of bare except)
For duck-typing try/except, keep try body minimal and use else for the main logic

Files:

tests/integration/defs/accuracy/test_disaggregated_serving.py

**/*.{cpp,cc,cxx,cu,h,hpp,hh,hxx,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header with current year to all source files

Files:

tests/integration/defs/accuracy/test_disaggregated_serving.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (5)

tests/integration/defs/accuracy/references/cnn_dailymail.yaml (1)

2-2: Lowered baseline recorded; please confirm provenance

Accuracy for google/gemma-3-1b-it is reduced to 19.0. Please confirm this reflects the latest CI run on August 28, 2025 and, if applicable, add/update the associated waiver in waives.txt so CI expectations remain consistent. Also consider adding a brief YAML comment noting the run/date used to derive this value.

tests/integration/defs/accuracy/test_disaggregated_serving.py (4)

378-378: LGTM: device gating aligns with server-to-GPU mapping

This test also requires two GPUs (one per server); the skip mark is appropriate.

426-426: LGTM: correct gating for dual-server setup

EAGLE3 path still launches two servers; requiring ≥2 devices avoids CI flakiness.

485-485: LGTM: guided decoding test needs ≥2 devices

Consistent with the launch_disaggregated_llm allocation; good addition.

1-4: PR objective mismatch: waives.txt not included

PR description says “Add failed cases into waives.txt”, but this changes tests and a reference YAML instead. Please confirm whether waives.txt should be part of this PR or if the description needs updating.

xinhe-nv · 2025-08-29T05:13:10Z

/bot run

tensorrt-cicd · 2025-08-29T05:20:47Z

PR_Github #16950 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-29T05:21:40Z

PR_Github #16950 [ run ] completed with state FAILURE

xinhe-nv · 2025-08-29T09:32:36Z

/bot run

tensorrt-cicd · 2025-08-29T09:37:54Z

PR_Github #16971 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-29T10:13:14Z

PR_Github #16971 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12739 completed with status: 'FAILURE'

xinhe-nv · 2025-08-30T01:01:14Z

/bot run

tensorrt-cicd · 2025-08-30T01:06:36Z

PR_Github #17031 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-30T01:44:02Z

PR_Github #17031 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12792 completed with status: 'FAILURE'

Signed-off-by: Xin He (SW-GPU) <[email protected]>

xinhe-nv · 2025-08-30T01:50:00Z

/bot run

tensorrt-cicd · 2025-08-30T01:55:02Z

PR_Github #17036 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-30T04:49:00Z

PR_Github #17036 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12796 completed with status: 'SUCCESS'

xinhe-nv requested review from crazydemo and LarryXFly August 28, 2025 09:56

coderabbitai bot reviewed Aug 28, 2025

View reviewed changes

xinhe-nv force-pushed the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch 2 times, most recently from d07e1a7 to 433479b Compare August 29, 2025 04:46

xinhe-nv marked this pull request as ready for review August 29, 2025 05:12

xinhe-nv enabled auto-merge (squash) August 29, 2025 05:12

xinhe-nv changed the title ~~[None][chore] Add failed cases into waives.txt~~ [TRTLLM-7250][fix] Add failed cases into waives.txt Aug 29, 2025

xinhe-nv force-pushed the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch from 433479b to 16d0f16 Compare August 29, 2025 09:32

LarryXFly approved these changes Aug 29, 2025

View reviewed changes

xinhe-nv force-pushed the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch from 16d0f16 to c364b16 Compare August 30, 2025 01:01

update disa-serving tests

7ac0255

Signed-off-by: Xin He (SW-GPU) <[email protected]>

xinhe-nv force-pushed the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch from c364b16 to 7ac0255 Compare August 30, 2025 01:49

xinhe-nv merged commit 5f939b9 into NVIDIA:main Aug 30, 2025
5 checks passed

xinhe-nv deleted the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch August 30, 2025 05:00

[TRTLLM-7250][fix] Add failed cases into waives.txt #7342

[TRTLLM-7250][fix] Add failed cases into waives.txt #7342

Uh oh!

Conversation

xinhe-nv commented Aug 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

xinhe-nv commented Aug 29, 2025

Uh oh!

tensorrt-cicd commented Aug 29, 2025

Uh oh!

tensorrt-cicd commented Aug 29, 2025

Uh oh!

xinhe-nv commented Aug 29, 2025

Uh oh!

tensorrt-cicd commented Aug 29, 2025

Uh oh!

tensorrt-cicd commented Aug 29, 2025

Uh oh!

xinhe-nv commented Aug 30, 2025

Uh oh!

tensorrt-cicd commented Aug 30, 2025

Uh oh!

tensorrt-cicd commented Aug 30, 2025

Uh oh!

xinhe-nv commented Aug 30, 2025

Uh oh!

tensorrt-cicd commented Aug 30, 2025

Uh oh!

tensorrt-cicd commented Aug 30, 2025

Uh oh!

Uh oh!

Uh oh!

xinhe-nv commented Aug 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 28, 2025 •

edited

Loading