Skip to content

Conversation

@TopRichard
Copy link
Collaborator

No description provided.

@TopRichard TopRichard marked this pull request as draft September 10, 2025 14:27
Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

librt.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/librt.so.1 (0x0000fffcf7c80000)
```

### Testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with this the idea is to show that it works in principle. You only show bandwidth here, what about latency? Are there numbers to compare with using the Cray toolchains? What about device-to-device?

I don't necessarily want you to write a lot here, but it is nice to show that all the things you might hope for do work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second test is osu_bibw -d cuda D D, i can add the latency test and a comparison to runs using Cray toolchains.

@trz42
Copy link
Collaborator

trz42 commented Sep 10, 2025

I'd rename the URL replacing Cray with HPE and do the same in the text.

Copy link
Collaborator

@trz42 trz42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, as a reader I wonder what I learn from this? Somehow, magically it works to run OSU benchmarks on Slingshot-11 systems. But do I learn more?

  • How can I actually implement this on my system?
  • How good are the results actually? (minor issue: how are the tests actually run, e.g., job script or so could be nice)
  • Why do I need to do this at all if EESSI's OpenMPI already supports libfabrics?


# EESSI on Cray system with Slingshot 11

High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather claim that while Slingshot-11 is becoming more popular (name a few example systems such as LUMI, Komondor, Olivia, ...) most systems we have had experience with are InfiniBand-based and we were curious if EESSI would work well also on Slingshot-11 systems.


High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads.

In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 CXI support on Cray systems and its integration with EESSI using `host_injections`. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"present requirements" sounds like there will be a checklist, but not if there is a step-by-step guide on how to use EESSI on Slingshot-11 systems. For a reader it may be necessary to explain a little how OpenMPI is built in EESSI and then how the software/driver ecosystem for Slingshot-11 looks like; then how one could bring these together (EESSI is already built and read-only, so how can optimized software support for Slingshot-11 work?; then one could introduce host_injections as a means to customize EESSI to specifics of a system).


## The Challenge

EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot 11 CXI requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these "custom-built libraries" ever be included in EESSI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt, those are custom made to fit local needs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought about this but indeed we could do this via a dev.eessi.io repo dedicated to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice way of sharing experience on this topic


EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot 11 CXI requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to:

1. Build OpenMPI 5.x with native Slingshot 11 CXI support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the CXI important? Sounds very nerdy to me. If it's important, maybe a sentence on what it stands for might help the uninformed reader (like me).

- **Partition 1**: x86_64 AMD CPUs without accelerators
- **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators

For the Grace/Hopper partition we needed to enable CUDA support in libfabric.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite a detail for which is no context provided. Why is libfabric suddenly mentioned?

For the Grace/Hopper partition we needed to enable CUDA support in libfabric.

## Building the Dependency Chain

Copy link
Collaborator

@trz42 trz42 Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a little too early? One could first explain different options, but needs probably a bit background information about the design choices made in EESSI, e.g., it uses OpenMPI built with support for UCX and libfabrics. Because EESSI relies on RPATH linking, we cannot simply replace the whole OpenMP installation, but rather replace specific libraries/parts by "injecting" them. Which libraries could be targeted, libmpi, others, ...? Could we use libraries provided as part of the Cray MPICH installation? Why not? ... possibly point at new MPI ABI compatibility standard which might help making such scenarios easier in the future.


### Building Strategy

Rather than relying on Cray-provided system packages, we opted to build all dependencies from source on top of EESSI. This approach provides several advantages:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to understand (not relying on Cray/HPE-provided system packages) one would need to understand what the dependencies actually are ... and where do they come from.


## EESSI Integration via `host_injections`

EESSI's [host_injections](../../../../site_specific_config/host_injections.md) mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be misunderstood that host_injections ensure compatibility with the rest of the software stack. Is that the case? What does host_injections actually do? And what do we have to do to ensure compatibility?

Comment on lines 63 to 65
**Validating `libmpi.so.40` in `host_injections` on ARM nodes**:
```
ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear to me what you want to validate with the output? The output contains some dependencies provided the by the compat layer. Are these important here? Then there are some to the software layer, some to host_injections and some to a local file system. Without explanation it doesn't say much. Also not clear why one has to run ldd for this path (it seems to be quite an unusual path with rpath_overrides).

@TopRichard
Copy link
Collaborator Author

I'd rename the URL replacing Cray with HPE and do the same in the text.

You mean Cray --> HPE/Cray ?

@ocaisa
Copy link
Member

ocaisa commented Sep 11, 2025

We need to be mindful here that this is just a blog post.I think there is a paper there in terms of content and impact, but that is going to take a lot longer to pull together. It is good to get something out the door now and spike interest a bit. Perhaps another interesting topic to debate a bit in the EESSI Happy Hour?

Let's change the tone perhaps and present this as a first look, and promise more is to come. I believe this would make a great paper for EuroHPC User Days, with Lumi as the ultimate target,

@trz42
Copy link
Collaborator

trz42 commented Sep 11, 2025

We need to be mindful here that this is just a blog post.I think there is a paper there in terms of content and impact, but that is going to take a lot longer to pull together. It is good to get something out the door now and spike interest a bit. Perhaps another interesting topic to debate a bit in the EESSI Happy Hour?

Let's change the tone perhaps and present this as a first look, and promise more is to come. I believe this would make a great paper for EuroHPC User Days, with Lumi as the ultimate target,

Agree. If it would be named something like "First (promising) results running EESSI on Slingshot-11" or similar one could focus on the challenge and the results ... hinting at more detailed information to come.

./configure --prefix=/cluster/installations/eessi/default/aarch64/software/OpenMPI/5.0.7-GCC-12.3.0 --with-cuda=${EBROOTCUDA} --with-cuda-libdir=${EBROOTCUDA}/lib64 --with-slurm --enable-mpi-ext=cuda --with-libfabric=${EBROOTLIBFABRIC} --with-ucx=${EBROOTUCX} --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/hwloc/2.9.1-GCCcore-12.3.0 --with-libevent=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0 --with-pmix=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/PMIx/4.2.4-GCCcore-12.3.0 --with-ucc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCC/1.2.0-GCCcore-12.3.0 --with-prrte=internal
```
```
ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TopRichard I made a number of updates directly to the PR, so please take a look.

I didn't touch this, but I think this is not exactly the right thing to show here, at least not on it's own. For one, you are definitely not using the ldd from the EESSI compat layer (which would give slightly different results). I think you want to follow this up with

lddtree $(which osu_latency)

which shows where the OSU binaries are picking up their MPI libraries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lddtree is nice (and available in the compat layer) since it shows you which .so requires what


## The Challenge

EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot-11 can sometimes require custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to ever ship these custom libraries in EESSI? Maybe rather spin the story around that EESSI cannot ship all kinds of custom-built libraries, but it provides means to customise EESSI to use optimised libraries.

Slingshot-11 promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads ... so
this should be worth the effort!

In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be rephrased to illustrate that this is a bit tricky.

Suggested change
In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md)
In this blog post, we briefly present how we enabled using Slingshot-11 optimised MPI with EESSI/2023.06 which builds on OpenMPI 4.1.x. While EESSI provides a mechanism to _inject_ custom-built MPI libraries, we cannot simply take MPI libraries one typically finds on a HPE/Cray because they are not ABI-compatible with OpenMPI. However, the recent open sourcing of CXI provided a possibility to reach our goal. _NEED SOMETHING THAT EXPLAINS WHY WE NEED OpenMPI 5.x__ We will list the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md)

3. Place the libraries somewhere where EESSI automatically picks them up
4. Support both x86_64 AMD CPU partitions and NVIDIA Grace CPU partitions with Hopper accelerators

The main task is to build the required dependencies on top of EESSI, since many of the libraries needed for libfabric with CXI support are not yet available in the current EESSI stack.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "build ... on top of EESSI" could be misunderstood. What you actually do is building libraries compatible with EESSI (on top of the compat layer plus possibly libraries from the software layer you want to reuse).

libfabric comes a bit out of the blue here. Maybe adjust the first item in the list above to mention that this requires libfabric with with CXI support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants