-
Couldn't load subscription status.
- Fork 44
blog-EESSI-Cray-Slingshot11 #551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job!
| librt.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/librt.so.1 (0x0000fffcf7c80000) | ||
| ``` | ||
|
|
||
| ### Testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess with this the idea is to show that it works in principle. You only show bandwidth here, what about latency? Are there numbers to compare with using the Cray toolchains? What about device-to-device?
I don't necessarily want you to write a lot here, but it is nice to show that all the things you might hope for do work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second test is osu_bibw -d cuda D D, i can add the latency test and a comparison to runs using Cray toolchains.
Co-authored-by: ocaisa <[email protected]>
|
I'd rename the URL replacing Cray with HPE and do the same in the text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, as a reader I wonder what I learn from this? Somehow, magically it works to run OSU benchmarks on Slingshot-11 systems. But do I learn more?
- How can I actually implement this on my system?
- How good are the results actually? (minor issue: how are the tests actually run, e.g., job script or so could be nice)
- Why do I need to do this at all if EESSI's OpenMPI already supports libfabrics?
|
|
||
| # EESSI on Cray system with Slingshot 11 | ||
|
|
||
| High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather claim that while Slingshot-11 is becoming more popular (name a few example systems such as LUMI, Komondor, Olivia, ...) most systems we have had experience with are InfiniBand-based and we were curious if EESSI would work well also on Slingshot-11 systems.
|
|
||
| High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. | ||
|
|
||
| In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 CXI support on Cray systems and its integration with EESSI using `host_injections`. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"present requirements" sounds like there will be a checklist, but not if there is a step-by-step guide on how to use EESSI on Slingshot-11 systems. For a reader it may be necessary to explain a little how OpenMPI is built in EESSI and then how the software/driver ecosystem for Slingshot-11 looks like; then how one could bring these together (EESSI is already built and read-only, so how can optimized software support for Slingshot-11 work?; then one could introduce host_injections as a means to customize EESSI to specifics of a system).
|
|
||
| ## The Challenge | ||
|
|
||
| EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot 11 CXI requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these "custom-built libraries" ever be included in EESSI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt, those are custom made to fit local needs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't thought about this but indeed we could do this via a dev.eessi.io repo dedicated to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a nice way of sharing experience on this topic
|
|
||
| EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot 11 CXI requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: | ||
|
|
||
| 1. Build OpenMPI 5.x with native Slingshot 11 CXI support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the CXI important? Sounds very nerdy to me. If it's important, maybe a sentence on what it stands for might help the uninformed reader (like me).
| - **Partition 1**: x86_64 AMD CPUs without accelerators | ||
| - **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators | ||
|
|
||
| For the Grace/Hopper partition we needed to enable CUDA support in libfabric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite a detail for which is no context provided. Why is libfabric suddenly mentioned?
| For the Grace/Hopper partition we needed to enable CUDA support in libfabric. | ||
|
|
||
| ## Building the Dependency Chain | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a little too early? One could first explain different options, but needs probably a bit background information about the design choices made in EESSI, e.g., it uses OpenMPI built with support for UCX and libfabrics. Because EESSI relies on RPATH linking, we cannot simply replace the whole OpenMP installation, but rather replace specific libraries/parts by "injecting" them. Which libraries could be targeted, libmpi, others, ...? Could we use libraries provided as part of the Cray MPICH installation? Why not? ... possibly point at new MPI ABI compatibility standard which might help making such scenarios easier in the future.
|
|
||
| ### Building Strategy | ||
|
|
||
| Rather than relying on Cray-provided system packages, we opted to build all dependencies from source on top of EESSI. This approach provides several advantages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to understand (not relying on Cray/HPE-provided system packages) one would need to understand what the dependencies actually are ... and where do they come from.
|
|
||
| ## EESSI Integration via `host_injections` | ||
|
|
||
| EESSI's [host_injections](../../../../site_specific_config/host_injections.md) mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be misunderstood that host_injections ensure compatibility with the rest of the software stack. Is that the case? What does host_injections actually do? And what do we have to do to ensure compatibility?
| **Validating `libmpi.so.40` in `host_injections` on ARM nodes**: | ||
| ``` | ||
| ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear to me what you want to validate with the output? The output contains some dependencies provided the by the compat layer. Are these important here? Then there are some to the software layer, some to host_injections and some to a local file system. Without explanation it doesn't say much. Also not clear why one has to run ldd for this path (it seems to be quite an unusual path with rpath_overrides).
You mean Cray --> HPE/Cray ? |
Co-authored-by: Thomas Röblitz <[email protected]>
|
We need to be mindful here that this is just a blog post.I think there is a paper there in terms of content and impact, but that is going to take a lot longer to pull together. It is good to get something out the door now and spike interest a bit. Perhaps another interesting topic to debate a bit in the EESSI Happy Hour? Let's change the tone perhaps and present this as a first look, and promise more is to come. I believe this would make a great paper for EuroHPC User Days, with Lumi as the ultimate target, |
Agree. If it would be named something like "First (promising) results running EESSI on Slingshot-11" or similar one could focus on the challenge and the results ... hinting at more detailed information to come. |
| ./configure --prefix=/cluster/installations/eessi/default/aarch64/software/OpenMPI/5.0.7-GCC-12.3.0 --with-cuda=${EBROOTCUDA} --with-cuda-libdir=${EBROOTCUDA}/lib64 --with-slurm --enable-mpi-ext=cuda --with-libfabric=${EBROOTLIBFABRIC} --with-ucx=${EBROOTUCX} --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/hwloc/2.9.1-GCCcore-12.3.0 --with-libevent=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0 --with-pmix=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/PMIx/4.2.4-GCCcore-12.3.0 --with-ucc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCC/1.2.0-GCCcore-12.3.0 --with-prrte=internal | ||
| ``` | ||
| ``` | ||
| ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TopRichard I made a number of updates directly to the PR, so please take a look.
I didn't touch this, but I think this is not exactly the right thing to show here, at least not on it's own. For one, you are definitely not using the ldd from the EESSI compat layer (which would give slightly different results). I think you want to follow this up with
lddtree $(which osu_latency)
which shows where the OSU binaries are picking up their MPI libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lddtree is nice (and available in the compat layer) since it shows you which .so requires what
…Slingshot11-blog
…to EESSI-Cray-Slingshot11-blog
…Slingshot11-blog
Co-authored-by: Thomas Röblitz <[email protected]>
Co-authored-by: Thomas Röblitz <[email protected]>
Co-authored-by: Thomas Röblitz <[email protected]>
Co-authored-by: Thomas Röblitz <[email protected]>
|
|
||
| ## The Challenge | ||
|
|
||
| EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot-11 can sometimes require custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to ever ship these custom libraries in EESSI? Maybe rather spin the story around that EESSI cannot ship all kinds of custom-built libraries, but it provides means to customise EESSI to use optimised libraries.
| Slingshot-11 promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads ... so | ||
| this should be worth the effort! | ||
|
|
||
| In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be rephrased to illustrate that this is a bit tricky.
| In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md) | |
| In this blog post, we briefly present how we enabled using Slingshot-11 optimised MPI with EESSI/2023.06 which builds on OpenMPI 4.1.x. While EESSI provides a mechanism to _inject_ custom-built MPI libraries, we cannot simply take MPI libraries one typically finds on a HPE/Cray because they are not ABI-compatible with OpenMPI. However, the recent open sourcing of CXI provided a possibility to reach our goal. _NEED SOMETHING THAT EXPLAINS WHY WE NEED OpenMPI 5.x__ We will list the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md) |
| 3. Place the libraries somewhere where EESSI automatically picks them up | ||
| 4. Support both x86_64 AMD CPU partitions and NVIDIA Grace CPU partitions with Hopper accelerators | ||
|
|
||
| The main task is to build the required dependencies on top of EESSI, since many of the libraries needed for libfabric with CXI support are not yet available in the current EESSI stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "build ... on top of EESSI" could be misunderstood. What you actually do is building libraries compatible with EESSI (on top of the compat layer plus possibly libraries from the software layer you want to reuse).
libfabric comes a bit out of the blue here. Maybe adjust the first item in the list above to mention that this requires libfabric with with CXI support?
No description provided.