Skip to content

Conversation

xinhe-nv
Copy link
Collaborator

@xinhe-nv xinhe-nv commented Aug 28, 2025

waive failed cases.

Summary by CodeRabbit

  • Tests
    • Updated accuracy reference for a model on the CNN/DailyMail dataset.
    • Added conditional skips requiring at least 2 devices for several disaggregated serving tests (auto-dtype, n-gram, Eagle3, guided decoding).
  • Chores
    • Data fixture updates and test waiver entries; no user-facing or API changes.

Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

📝 Walkthrough

Walkthrough

Updated a CNN/DailyMail accuracy reference for google/gemma-3-1b-it, added device-count skip decorators to four tests in TestLlama3_1_8BInstruct, and added three SKIP entries to the integration waives list. No production code or public API changes.

Changes

Cohort / File(s) Summary
Accuracy reference update
tests/integration/defs/accuracy/references/cnn_dailymail.yaml
Adjusted recorded accuracy for google/gemma-3-1b-it from 22.988 to 19.0; other fields unchanged.
Test skip gating (>=2 devices)
tests/integration/defs/accuracy/test_disaggregated_serving.py
Added @pytest.mark.skip_less_device(2) to test_auto_dtype, test_ngram, test_eagle3, and test_guided_decoding in TestLlama3_1_8BInstruct.
Integration waives list
tests/integration/test_lists/waives.txt
Added three SKIP entries for failing accuracy/disaggregated-serving tests (entries for L20 paths; related to test_auto_dtype, test_eagle3, and test_tp_pp_symmetric).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Pytest as Test Runner
    participant TestCase as TestLlama3_1_8BInstruct
    Note over Pytest,TestCase #DDEBF7: New gating: skip if devices < 2
    Pytest->>TestCase: collect tests
    TestCase->>Pytest: test metadata (has skip_less_device(2))
    alt devices >= 2
        Pytest->>TestCase: execute test
        TestCase-->>Pytest: test result (pass/fail)
    else devices < 2
        Pytest-->>TestCase: skip test
        TestCase-->>Pytest: skipped
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • crazydemo
  • LarryXFly

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4c27d65 and d07e1a7.

📒 Files selected for processing (3)
  • tests/integration/defs/accuracy/references/cnn_dailymail.yaml (1 hunks)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (4 hunks)
  • tests/integration/test_lists/waives.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/integration/defs/accuracy/references/cnn_dailymail.yaml
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-26T09:49:04.956Z
Learnt from: pengbowang-nv
PR: NVIDIA/TensorRT-LLM#7192
File: tests/integration/test_lists/test-db/l0_dgx_b200.yml:56-72
Timestamp: 2025-08-26T09:49:04.956Z
Learning: In TensorRT-LLM test configuration files, the test scheduling system handles wildcard matching with special rules that prevent duplicate test execution even when the same tests appear in multiple yaml files with overlapping GPU wildcards (e.g., "*b200*" and "*gb200*").

Applied to files:

  • tests/integration/test_lists/waives.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tests/integration/test_lists/waives.txt (1)

306-308: L20 waivers correctly mirror their L40S counterparts.

  • Verified that TestDeepSeekV3Lite::test_auto_dtype, TestLlama3_1_8BInstruct::test_eagle3, and TestLlama3_1_8BInstruct::test_tp_pp_symmetric exist in tests/integration/defs/accuracy/test_disaggregated_serving.py with the specified parameter combinations; each L20 entry appears exactly once and matches its L40S waiver.
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/integration/test_lists/waives.txt (2)

342-347: Normalize nvbugs links to the canonical host

Use the same canonical URL format used elsewhere to avoid broken links and ease grep. Apply the following:

-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False] SKIP (https://nvbugs/5486081)
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True] SKIP (https://nvbugs/5486081)
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False] SKIP (https://nvbugs/5471106)
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True] SKIP (https://nvbugs/5486081)
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram SKIP (https://nvbugs/5486081)
-accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugs/5451662)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False] SKIP (https://nvbugspro.nvidia.com/bug/5486081)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True] SKIP (https://nvbugspro.nvidia.com/bug/5486081)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False] SKIP (https://nvbugspro.nvidia.com/bug/5471106)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True] SKIP (https://nvbugspro.nvidia.com/bug/5486081)
+accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram SKIP (https://nvbugspro.nvidia.com/bug/5486081)
+accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugspro.nvidia.com/bug/5451662)

342-347: Add TTL/owner to waivers

Consider appending a short owner tag or expiration date to prevent waivers from becoming permanent.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ae89163 and 976f4f5.

📒 Files selected for processing (1)
  • tests/integration/test_lists/waives.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tests/integration/test_lists/waives.txt (4)

342-343: Verify both auto_dtype=True and False skips map to the same NVBUG
Currently both test_auto_dtype[False] and [True] are skipped under nvbugs/5486081; confirm they truly share the same root cause or split into separate bug references/narrower selectors.


347-347: Duplicate skip entry: narrow generic waiver
tests/integration/test_lists/waives.txt has both an L40S-scoped skip (line 233) and a generic skip (line 347) for TestGemma3_1BInstruct::test_auto_dtype. If this failure is L40S-only, remove or scope the generic skip; otherwise, keep as-is.


346-346: Waiver selector string is valid
Confirmed TestLlama3_1_8BInstruct defines def test_ngram in the source, so the waiver entry correctly targets an existing test.


344-345: Check waiver coverage for eagle3 parameter permutations
Only two of the four eagle3_one_model/overlap_scheduler combinations are waived. Confirm which permutations are actually impacted by the bug: add waivers for any additional failing combos or restrict the file to exactly the two known failures.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)

1-1: Mandatory NVIDIA copyright header missing

Per repo guidelines, prepend the NVIDIA copyright header (current year) to Python sources.

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

55-55: Python 3.8 incompatibility: PEP 585 generics used at runtime

list[concurrent.futures.Future[RequestOutput]] requires Python 3.9+. The project targets 3.8+, so this will raise at import time on 3.8. Use typing.List instead.

-        self.futures: list[concurrent.futures.Future[RequestOutput]] = []
+        self.futures: List[concurrent.futures.Future[RequestOutput]] = []
🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)

347-347: Skip on < 2 devices is correct; consider deduping at class level

These tests launch separate context and generation servers, each bound to a different GPU via CUDA_VISIBLE_DEVICES, so ≥2 devices are required—good to guard with skip_less_device(2). To DRY this, you can apply the mark once at the class level and drop the per-test marks.

 @pytest.mark.timeout(3600)
-class TestLlama3_1_8BInstruct(LlmapiAccuracyTestHarness):
+@pytest.mark.skip_less_device(2)
+class TestLlama3_1_8BInstruct(LlmapiAccuracyTestHarness):
@@
-    @pytest.mark.skip_less_device(2)
     @pytest.mark.skip_less_device_memory(32000)
@@
-    @pytest.mark.skip_less_device(2)
@@
-    @pytest.mark.skip_less_device(2)
@@
-    @pytest.mark.skip_less_device(2)
     @pytest.mark.skip_less_device_memory(32000)

522-574: One more test likely needs the same device gating

test_guided_decoding_with_eagle3 also launches one context and one generation server (each consuming one GPU). Consider adding @pytest.mark.skip_less_device(2) here for consistency and to avoid single-GPU CI failures.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 976f4f5 and 4c27d65.

📒 Files selected for processing (2)
  • tests/integration/defs/accuracy/references/cnn_dailymail.yaml (1 hunks)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs
Preserve module namespaces in imports: import the subpackage/module, not the symbol (from package.subpackage import foo; foo.SomeClass())
Naming: files snake_case; classes PascalCase; functions/methods snake_case; local variables snake_case (k_ prefix if starting with a number); globals G_ + UPPER_SNAKE_CASE; constants UPPER_SNAKE_CASE
Avoid shadowing outer-scope variables; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; reserve comments for function-internal or file-local interfaces
Use Google-style docstrings for classes and functions; inline docstrings for attributes/variables are allowed
Avoid reflection when straightforward code suffices (e.g., prefer explicit parameters over dict(**locals()))
Use narrow except clauses (e.g., catch FileNotFoundError instead of bare except)
For duck-typing try/except, keep try body minimal and use else for the main logic

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cc,cxx,cu,h,hpp,hh,hxx,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header with current year to all source files

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/integration/defs/accuracy/references/cnn_dailymail.yaml (1)

2-2: Lowered baseline recorded; please confirm provenance

Accuracy for google/gemma-3-1b-it is reduced to 19.0. Please confirm this reflects the latest CI run on August 28, 2025 and, if applicable, add/update the associated waiver in waives.txt so CI expectations remain consistent. Also consider adding a brief YAML comment noting the run/date used to derive this value.

tests/integration/defs/accuracy/test_disaggregated_serving.py (4)

378-378: LGTM: device gating aligns with server-to-GPU mapping

This test also requires two GPUs (one per server); the skip mark is appropriate.


426-426: LGTM: correct gating for dual-server setup

EAGLE3 path still launches two servers; requiring ≥2 devices avoids CI flakiness.


485-485: LGTM: guided decoding test needs ≥2 devices

Consistent with the launch_disaggregated_llm allocation; good addition.


1-4: PR objective mismatch: waives.txt not included

PR description says “Add failed cases into waives.txt”, but this changes tests and a reference YAML instead. Please confirm whether waives.txt should be part of this PR or if the description needs updating.

@xinhe-nv xinhe-nv force-pushed the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch 2 times, most recently from d07e1a7 to 433479b Compare August 29, 2025 04:46
@xinhe-nv xinhe-nv marked this pull request as ready for review August 29, 2025 05:12
@xinhe-nv xinhe-nv enabled auto-merge (squash) August 29, 2025 05:12
@xinhe-nv xinhe-nv changed the title [None][chore] Add failed cases into waives.txt [TRTLLM-7250][fix] Add failed cases into waives.txt Aug 29, 2025
@xinhe-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16950 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16950 [ run ] completed with state FAILURE

@xinhe-nv xinhe-nv force-pushed the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch from 433479b to 16d0f16 Compare August 29, 2025 09:32
@xinhe-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16971 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16971 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12739 completed with status: 'FAILURE'

@xinhe-nv xinhe-nv force-pushed the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch from 16d0f16 to c364b16 Compare August 30, 2025 01:01
@xinhe-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17031 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17031 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #12792 completed with status: 'FAILURE'

Signed-off-by: Xin He (SW-GPU) <[email protected]>
@xinhe-nv xinhe-nv force-pushed the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch from c364b16 to 7ac0255 Compare August 30, 2025 01:49
@xinhe-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17036 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17036 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12796 completed with status: 'SUCCESS'

@xinhe-nv xinhe-nv merged commit 5f939b9 into NVIDIA:main Aug 30, 2025
5 checks passed
@xinhe-nv xinhe-nv deleted the user/qa/post_update_waive_20250828_LLM_FUNCTION_TEST_1167 branch August 30, 2025 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants