-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[TRTLLM-7250][fix] Add failed cases into waives.txt #7342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-7250][fix] Add failed cases into waives.txt #7342
Conversation
📝 WalkthroughWalkthroughUpdated a CNN/DailyMail accuracy reference for Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Pytest as Test Runner
participant TestCase as TestLlama3_1_8BInstruct
Note over Pytest,TestCase #DDEBF7: New gating: skip if devices < 2
Pytest->>TestCase: collect tests
TestCase->>Pytest: test metadata (has skip_less_device(2))
alt devices >= 2
Pytest->>TestCase: execute test
TestCase-->>Pytest: test result (pass/fail)
else devices < 2
Pytest-->>TestCase: skip test
TestCase-->>Pytest: skipped
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-08-26T09:49:04.956Z
Applied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/integration/test_lists/waives.txt (2)
342-347
: Normalize nvbugs links to the canonical hostUse the same canonical URL format used elsewhere to avoid broken links and ease grep. Apply the following:
-accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False] SKIP (https://nvbugs/5486081) -accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True] SKIP (https://nvbugs/5486081) -accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False] SKIP (https://nvbugs/5471106) -accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True] SKIP (https://nvbugs/5486081) -accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram SKIP (https://nvbugs/5486081) -accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugs/5451662) +accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False] SKIP (https://nvbugspro.nvidia.com/bug/5486081) +accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True] SKIP (https://nvbugspro.nvidia.com/bug/5486081) +accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False] SKIP (https://nvbugspro.nvidia.com/bug/5471106) +accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True] SKIP (https://nvbugspro.nvidia.com/bug/5486081) +accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram SKIP (https://nvbugspro.nvidia.com/bug/5486081) +accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype SKIP (https://nvbugspro.nvidia.com/bug/5451662)
342-347
: Add TTL/owner to waiversConsider appending a short owner tag or expiration date to prevent waivers from becoming permanent.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/integration/test_lists/waives.txt
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tests/integration/test_lists/waives.txt (4)
342-343
: Verify both auto_dtype=True and False skips map to the same NVBUG
Currently bothtest_auto_dtype[False]
and[True]
are skipped under nvbugs/5486081; confirm they truly share the same root cause or split into separate bug references/narrower selectors.
347-347
: Duplicate skip entry: narrow generic waiver
tests/integration/test_lists/waives.txt has both an L40S-scoped skip (line 233) and a generic skip (line 347) for TestGemma3_1BInstruct::test_auto_dtype. If this failure is L40S-only, remove or scope the generic skip; otherwise, keep as-is.
346-346
: Waiver selector string is valid
ConfirmedTestLlama3_1_8BInstruct
definesdef test_ngram
in the source, so the waiver entry correctly targets an existing test.
344-345
: Check waiver coverage for eagle3 parameter permutations
Only two of the four eagle3_one_model/overlap_scheduler combinations are waived. Confirm which permutations are actually impacted by the bug: add waivers for any additional failing combos or restrict the file to exactly the two known failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
1-1
: Mandatory NVIDIA copyright header missingPer repo guidelines, prepend the NVIDIA copyright header (current year) to Python sources.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
55-55
: Python 3.8 incompatibility: PEP 585 generics used at runtime
list[concurrent.futures.Future[RequestOutput]]
requires Python 3.9+. The project targets 3.8+, so this will raise at import time on 3.8. Use typing.List instead.- self.futures: list[concurrent.futures.Future[RequestOutput]] = [] + self.futures: List[concurrent.futures.Future[RequestOutput]] = []
🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
347-347
: Skip on < 2 devices is correct; consider deduping at class levelThese tests launch separate context and generation servers, each bound to a different GPU via CUDA_VISIBLE_DEVICES, so ≥2 devices are required—good to guard with skip_less_device(2). To DRY this, you can apply the mark once at the class level and drop the per-test marks.
@pytest.mark.timeout(3600) -class TestLlama3_1_8BInstruct(LlmapiAccuracyTestHarness): +@pytest.mark.skip_less_device(2) +class TestLlama3_1_8BInstruct(LlmapiAccuracyTestHarness): @@ - @pytest.mark.skip_less_device(2) @pytest.mark.skip_less_device_memory(32000) @@ - @pytest.mark.skip_less_device(2) @@ - @pytest.mark.skip_less_device(2) @@ - @pytest.mark.skip_less_device(2) @pytest.mark.skip_less_device_memory(32000)
522-574
: One more test likely needs the same device gating
test_guided_decoding_with_eagle3
also launches one context and one generation server (each consuming one GPU). Consider adding@pytest.mark.skip_less_device(2)
here for consistency and to avoid single-GPU CI failures.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
tests/integration/defs/accuracy/references/cnn_dailymail.yaml
(1 hunks)tests/integration/defs/accuracy/test_disaggregated_serving.py
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py
: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs
Preserve module namespaces in imports: import the subpackage/module, not the symbol (from package.subpackage import foo; foo.SomeClass())
Naming: files snake_case; classes PascalCase; functions/methods snake_case; local variables snake_case (k_ prefix if starting with a number); globals G_ + UPPER_SNAKE_CASE; constants UPPER_SNAKE_CASE
Avoid shadowing outer-scope variables; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; reserve comments for function-internal or file-local interfaces
Use Google-style docstrings for classes and functions; inline docstrings for attributes/variables are allowed
Avoid reflection when straightforward code suffices (e.g., prefer explicit parameters over dict(**locals()))
Use narrow except clauses (e.g., catch FileNotFoundError instead of bare except)
For duck-typing try/except, keep try body minimal and use else for the main logic
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cc,cxx,cu,h,hpp,hh,hxx,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header with current year to all source files
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/integration/defs/accuracy/references/cnn_dailymail.yaml (1)
2-2
: Lowered baseline recorded; please confirm provenanceAccuracy for google/gemma-3-1b-it is reduced to 19.0. Please confirm this reflects the latest CI run on August 28, 2025 and, if applicable, add/update the associated waiver in waives.txt so CI expectations remain consistent. Also consider adding a brief YAML comment noting the run/date used to derive this value.
tests/integration/defs/accuracy/test_disaggregated_serving.py (4)
378-378
: LGTM: device gating aligns with server-to-GPU mappingThis test also requires two GPUs (one per server); the skip mark is appropriate.
426-426
: LGTM: correct gating for dual-server setupEAGLE3 path still launches two servers; requiring ≥2 devices avoids CI flakiness.
485-485
: LGTM: guided decoding test needs ≥2 devicesConsistent with the launch_disaggregated_llm allocation; good addition.
1-4
: PR objective mismatch: waives.txt not includedPR description says “Add failed cases into waives.txt”, but this changes tests and a reference YAML instead. Please confirm whether waives.txt should be part of this PR or if the description needs updating.
d07e1a7
to
433479b
Compare
/bot run |
PR_Github #16950 [ run ] triggered by Bot |
PR_Github #16950 [ run ] completed with state |
433479b
to
16d0f16
Compare
/bot run |
PR_Github #16971 [ run ] triggered by Bot |
PR_Github #16971 [ run ] completed with state |
16d0f16
to
c364b16
Compare
/bot run |
PR_Github #17031 [ run ] triggered by Bot |
PR_Github #17031 [ run ] completed with state |
Signed-off-by: Xin He (SW-GPU) <[email protected]>
c364b16
to
7ac0255
Compare
/bot run |
PR_Github #17036 [ run ] triggered by Bot |
PR_Github #17036 [ run ] completed with state |
waive failed cases.
Summary by CodeRabbit