Skip to content

Conversation

@romerojosh
Copy link
Collaborator

This PR introduces a new environment variable option, CUDECOMP_USE_COL_MAJOR_RANK_ORDER, to enable users to change the default row-major rank assignment over pdims to column-major.

This can be useful when users have problems that require process grids like 2 x 8 or 4 x 8 on a system with say, 4 NVLink connected GPUs per node (i.e. a small first dimension that is less than the number of NVLink connected GPUs). With the default row-major rank assignment, ranks are first assigned contiguously over the second dimension. This means that the first dimension, even though it is less than the number of NVLink connected GPUs, will span multiple nodes, resulting in slower internode communication for both the row and column communicators. Instead, if the rank assignment were column-major in this case, the ranks are first assigned across the small first dimension, resulting in a fast column communicator restricted to an NVLink connected group. This new option will enable users to realize this benefit in this type of situation.

@romerojosh romerojosh merged commit 12c1bb8 into main Sep 2, 2025
6 checks passed
@romerojosh romerojosh deleted the col_major_rank_order branch September 16, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants