Conversation
docs/src/usage.md
Outdated
| > comm_loc = MPI.Comm_split_type(comm, MPI.COMM_TYPE_SHARED, rank) | ||
| > rank_loc = MPI.Comm_rank(comm_loc) | ||
| > ``` | ||
| > If using (2), one can use the default device but make sur to handle device visbility in the scheduler; for SLURM on Cray systems, this can be mostly achieved using `--gpus-per-task=1`. |
There was a problem hiding this comment.
Doesn't '--gpus-per-task' for SLURM prevent the use of GPU Peer2Peer IPC mechanisms (https://cpe.ext.hpe.com/docs/24.03/mpt/mpich/intro_mpi.html) which would have a negative impact on performance?
There was a problem hiding this comment.
Yeah that's what I also remember, but perhaps Nvidia has finally fixed this?
There was a problem hiding this comment.
Thanks for pointing this out. I can make the text more generic
docs/src/usage.md
Outdated
| Successfully running the [alltoall\_test\_cuda.jl](https://gist.github.com/luraess/0063e90cb08eb2208b7fe204bbd90ed2) | ||
| should confirm your MPI implementation to have the CUDA support enabled. Moreover, successfully running the | ||
| [alltoall\_test\_cuda\_multigpu.jl](https://gist.github.com/luraess/ed93cc09ba04fe16f63b4219c1811566) should confirm |
There was a problem hiding this comment.
Can we move the files into this repository?
There was a problem hiding this comment.
Were shall one put them?
docs/src/usage.md
Outdated
| !!! note "Preloads" | ||
| On Cray machines, you may need to ensure the following preloads to be set in the preferences: | ||
| ``` | ||
| preloads = ["libmpi_gtl_hsa.so"] | ||
| preloads_env_switch = "MPICH_GPU_SUPPORT_ENABLED" | ||
| ``` |
docs/src/usage.md
Outdated
| preloads_env_switch = "MPICH_GPU_SUPPORT_ENABLED" | ||
| ``` | ||
|
|
||
| !!! note "Multiple GPUs per node" |
There was a problem hiding this comment.
| !!! note "Multiple GPUs per node" | |
| ### "Multiple GPUs per node" |
Since the text is not just on ROCM?
|
Happy if someone would review added file naming and location, and updated text 🙏 |
|
This looks good to me. It seems all issues were addressed. Thanks! |
|
This PR broke building the docs |
|
Yeah, this was somehow my open question in "review added file naming and location", as I would suspect it may not work like other examples but was unsure whether to put those examples in another folder. Any suggestions @giordano ? |
Adds infos to the doc as per discussion in #924.