Skip to content

[RCCL] Fix p2p_batching, revert PR 3001, disable warpSpeed on gfx942#3254

Open
isaki001 wants to merge 3 commits intodevelopfrom
users/isaki001/feb_drop
Open

[RCCL] Fix p2p_batching, revert PR 3001, disable warpSpeed on gfx942#3254
isaki001 wants to merge 3 commits intodevelopfrom
users/isaki001/feb_drop

Conversation

@isaki001
Copy link
Contributor

Motivation

Resolve errors/regressions

Technical Details

  • Restore p2p-batching functionality that was broken with NCCL sync.
  • revert PR 3001 due to regression on all collectives
  • disable warpSpeed building for gfx942, to resolve regressions. warpSpeed is still not enabled by default on MI300 but building added non-negligible overhead.

JIRA ID

Test Plan

Evaluate correctness and performance by running 1N/2N/4N/8N/16N allReduce, allGather, reduceScatter, and alltoall on gfx942 and gfx950.

Test Result

Fixed errors when enabling p2p-batching
Improved perf. on gfx942 and gfx950

Submission Checklist

@isaki001 isaki001 requested a review from a team as a code owner February 12, 2026 21:37
@isaki001 isaki001 changed the title Users/isaki001/feb drop fix p2p_batching, revert PR 3001 , disable warpSpeed on gfx942 Feb 12, 2026
@isaki001 isaki001 changed the title fix p2p_batching, revert PR 3001 , disable warpSpeed on gfx942 fix p2p_batching, revert PR 3001, disable warpSpeed on gfx942 Feb 12, 2026
@mustafabar mustafabar changed the title fix p2p_batching, revert PR 3001, disable warpSpeed on gfx942 [RCCL] Fix p2p_batching, revert PR 3001, disable warpSpeed on gfx942 Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants