Skip to content

Conversation

Barry-Delaney
Copy link
Collaborator

@Barry-Delaney Barry-Delaney commented Aug 29, 2025

This PR enables swap AB as an optional tactic for sm100 FP8 blockwise GEMM.

Summary by CodeRabbit

  • New Features

    • Enabled an additional tactic for FP8 GEMM that can automatically swap A/B operands when beneficial, expanding kernel choices for potential performance gains on supported GPUs. No API changes required.
  • Chores

    • Updated the DeepGEMM submodule to a new repository and branch to align with recent optimizations and ensure compatibility with the expanded FP8 GEMM tactics.

@Barry-Delaney Barry-Delaney self-assigned this Aug 29, 2025
@Barry-Delaney Barry-Delaney requested a review from a team as a code owner August 29, 2025 03:05
@Barry-Delaney Barry-Delaney requested a review from litaotju August 29, 2025 03:05
Copy link
Contributor

coderabbitai bot commented Aug 29, 2025

📝 Walkthrough

Walkthrough

Updates .gitmodules to point 3rdparty/DeepGEMM at a different repository and branch, and extends fp8_swap_ab_gemm in torch_custom_ops.py to accept tactics [0,1], branching at runtime to call fp8_gemm_nt (no swap) or fp8_gemm_ntt (swap) based on the tactic.

Changes

Cohort / File(s) Summary of Changes
Submodule config update
.gitmodules
Changed submodule 3rdparty/DeepGEMM URL from https://github.com/deepseek-ai/DeepGEMM.git to https://github.com/ruoqianguo/DeepGEMM.git; added branch dev/ruoqiang/swapab_sm100.
FP8 swap-AB GEMM tactic handling
tensorrt_llm/_torch/custom_ops/torch_custom_ops.py
Expanded valid tactics from [0] to [0, 1]; forward now checks tactic and calls fp8_gemm_nt when tactic==0 (no swap) or fp8_gemm_ntt when tactic==1 (swap); introduces swap_ab derived from tactic.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Op as fp8_swap_ab_gemm
  participant NT as fp8_gemm_nt
  participant NTT as fp8_gemm_ntt

  Caller->>Op: forward(A, B, tactic)
  alt tactic == 1 (swap AB)
    Note right of Op #f0f4ff: swap_ab = true
    Op->>NTT: compute with AB-swapped kernel
    NTT-->>Op: result
  else tactic == 0 (no swap)
    Note right of Op #f0f4ff: swap_ab = false
    Op->>NT: compute with standard kernel
    NT-->>Op: result
  end
  Op-->>Caller: output
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • litaotju
  • yizhang-nv
  • QiJune

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 250e268 and 10fc27f.

📒 Files selected for processing (2)
  • .gitmodules (1 hunks)
  • tensorrt_llm/_torch/custom_ops/torch_custom_ops.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .gitmodules
  • tensorrt_llm/_torch/custom_ops/torch_custom_ops.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Barry-Delaney Barry-Delaney force-pushed the user/barry/dev_swap_sm100 branch from 250e268 to 10fc27f Compare August 29, 2025 03:11
@Barry-Delaney
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16935 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16935 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12721 completed with status: 'FAILURE'

@Barry-Delaney Barry-Delaney marked this pull request as draft August 29, 2025 06:10
@Barry-Delaney Barry-Delaney force-pushed the user/barry/dev_swap_sm100 branch from 10fc27f to 165bf39 Compare August 29, 2025 06:34
@Barry-Delaney
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16953 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16953 [ run ] completed with state FAILURE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants