From 0fa9422d406086da669e17b85ef0238c56205eb2 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Wed, 10 Sep 2025 16:25:21 +0200 Subject: [PATCH 01/19] blog-EESSI-Cray-Slingshot11 --- .../posts/2025/09/eessi-cray-slingshot11.md | 231 ++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 docs/blog/posts/2025/09/eessi-cray-slingshot11.md diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md new file mode 100644 index 0000000000..723e870890 --- /dev/null +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -0,0 +1,231 @@ +--- +author: [Richard] +date: 2025-09-10 +slug: EESSI-on-Cray-Slingshot +--- + +# EESSI on Cray system with Slingshot 11 + +High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) represents a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. + +In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 CXI support on Cray systems and its integration with EESSI using host injections. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. + + + +## The Challenge + +EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot 11 CXI requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: + +1. Build OpenMPI 5.x with native Slingshot 11 CXI support +2. Create ABI-compatible replacements for EESSI's MPI libraries +3. Support both x86_64 AMD CPU partitions and NVIDIA Grace CPU partitions with Hopper accelerators +4. Avoid dependency on system packages where possible + +The main technical challenge is building the complete dependency chain on top of EESSI, as many of the required libraries for CXI support don't exist in the current EESSI stack. + +## System Architecture + +Our target system consists of two distinct partitions: + +- **Partition 1**: x86_64 AMD CPUs without accelerators +- **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators + +For the Grace/Hopper partition we needed to enable CUDA support in libfabric. + +## Building the Dependency Chain + +### Building Strategy + +Rather than relying on Cray-provided system packages, we opted to build all dependencies from source on top of EESSI. This approach provides several advantages: + +- **Consistency**: All libraries built with the same compiler toolchain +- **Compatibility**: Ensures ABI compatibility with EESSI libraries +- **Control**: Full control over build configurations and optimizations + +### Required Dependencies + +To build OpenMPI 5.x with CXI support, we needed the following missing dependencies: + +1. **libuv** - Asynchronous I/O library +2. **libnl** - Netlink library for network configuration +3. **libconfig** - Library designed for processing structured configuration files +4. **libfuse** - Filesystem in Userspace library +5. **libpdap** - Performance Data Access Protocol library +6. **shs-libcxi** - Slingshot CXI library +7. **lm-sensors** - Monitoring tools and drivers +8. **libfabric 2.x** - OpenFabrics Interfaces library with CXI provider +9. **OpenMPI 5.x** - The final MPI implementation + +## EESSI Integration via `host_injections` + +EESSI's [host_injections](../../../../site_specific_config/host_injections.md) mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack. + +*Validating `libmpi.so.40` in `host_injections` on the ARM nodes*: +``` +ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40 + + linux-vdso.so.1 (0x0000fffcfd1d0000) + libucc.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCC/1.2.0-GCCcore-12.3.0/lib64/libucc.so.1 (0x0000fffcfce50000) + libucs.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCX/1.14.1-GCCcore-12.3.0/lib64/libucs.so.0 (0x0000fffcfcde0000) + libnuma.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/numactl/2.0.16-GCCcore-12.3.0/lib64/libnuma.so.1 (0x0000fffcfcdb0000) + libucm.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCX/1.14.1-GCCcore-12.3.0/lib64/libucm.so.0 (0x0000fffcfcd70000) + libopen-pal.so.80 => /cluster/installations/eessi/default/aarch64/software/OpenMPI/5.0.7-GCC-12.3.0/lib/libopen-pal.so.80 (0x0000fffcfcc40000) + libfabric.so.1 => /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libfabric.so.1 (0x0000fffcfca50000) + librdmacm.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/librdmacm.so.1 (0x0000fffcfca10000) + libefa.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libefa.so.1 (0x0000fffcfc9e0000) + libibverbs.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libibverbs.so.1 (0x0000fffcfc9a0000) + libcxi.so.1 => /cluster/installations/eessi/default/aarch64/software/shs-libcxi/1.7.0-GCCcore-12.3.0/lib64/libcxi.so.1 (0x0000fffcfc960000) + libcurl.so.4 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libcurl.so.4 (0x0000fffcfc8a0000) + libjson-c.so.5 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/json-c/0.16-GCCcore-12.3.0/lib64/libjson-c.so.5 (0x0000fffcfc870000) + libatomic.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib64/libatomic.so.1 (0x0000fffcfc840000) + libcudart.so.12 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/software/CUDA/12.1.1/lib64/libcudart.so.12 (0x0000fffcfc780000) + libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x0000fffcf97d0000) + libnvidia-ml.so.1 => /usr/lib64/libnvidia-ml.so.1 (0x0000fffcf8980000) + libnl-route-3.so.200 => /cluster/installations/eessi/default/aarch64/software/libnl/3.11.0-GCCcore-12.3.0/lib64/libnl-route-3.so.200 (0x0000fffcf88d0000) + libnl-3.so.200 => /cluster/installations/eessi/default/aarch64/software/libnl/3.11.0-GCCcore-12.3.0/lib64/libnl-3.so.200 (0x0000fffcf8890000) + libpmix.so.2 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/PMIx/4.2.4-GCCcore-12.3.0/lib64/libpmix.so.2 (0x0000fffcf8690000) + libevent_core-2.1.so.7 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0/lib64/libevent_core-2.1.so.7 (0x0000fffcf8630000) + libevent_pthreads-2.1.so.7 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0/lib64/libevent_pthreads-2.1.so.7 (0x0000fffcf8600000) + libhwloc.so.15 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/hwloc/2.9.1-GCCcore-12.3.0/lib64/libhwloc.so.15 (0x0000fffcf8580000) + libpciaccess.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libpciaccess/0.17-GCCcore-12.3.0/lib64/libpciaccess.so.0 (0x0000fffcf8550000) + libxml2.so.2 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libxml2/2.11.4-GCCcore-12.3.0/lib64/libxml2.so.2 (0x0000fffcf83e0000) + libz.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libz.so.1 (0x0000fffcf83a0000) + liblzma.so.5 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/liblzma.so.5 (0x0000fffcf8330000) + libm.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libm.so.6 (0x0000fffcf8280000) + libc.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libc.so.6 (0x0000fffcf80e0000) + /lib/ld-linux-aarch64.so.1 (0x0000fffcfd1e0000) + libcares.so.2 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libcares.so.2 (0x0000fffcf80a0000) + libnghttp2.so.14 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libnghttp2.so.14 (0x0000fffcf8050000) + libssl.so.1.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/OpenSSL/1.1/lib64/libssl.so.1.1 (0x0000fffcf7fb0000) + libcrypto.so.1.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/OpenSSL/1.1/lib64/libcrypto.so.1.1 (0x0000fffcf7d10000) + libdl.so.2 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libdl.so.2 (0x0000fffcf7ce0000) + libpthread.so.0 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libpthread.so.0 (0x0000fffcf7cb0000) + librt.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/librt.so.1 (0x0000fffcf7c80000) +``` + +### Testing + +**1- Test using OSU-Micro-Benchmarks on 2-nodes (x86_64 AMD-CPUs)**: +``` +Environment set up to use EESSI (2023.06), have fun! +hostname: +x1001c6s2b0n1 +x1001c6s3b0n0 + +CPU info: +Vendor ID: AuthenticAMD +Model name: AMD EPYC 9745 128-Core Processor +Virtualization: AMD-V + +Currently Loaded Modules: + 1) GCCcore/12.3.0 + 2) GCC/12.3.0 + 3) numactl/2.0.16-GCCcore-12.3.0 + 4) libxml2/2.11.4-GCCcore-12.3.0 + 5) libpciaccess/0.17-GCCcore-12.3.0 + 6) hwloc/2.9.1-GCCcore-12.3.0 + 7) OpenSSL/1.1 + 8) libevent/2.1.12-GCCcore-12.3.0 + 9) UCX/1.14.1-GCCcore-12.3.0 + 10) libfabric/1.18.0-GCCcore-12.3.0 + 11) PMIx/4.2.4-GCCcore-12.3.0 + 12) UCC/1.2.0-GCCcore-12.3.0 + 13) OpenMPI/4.1.5-GCC-12.3.0 + 14) gompi/2023a + 15) OSU-Micro-Benchmarks/7.1-1-gompi-2023a + +# OSU MPI Bi-Directional Bandwidth Test v7.1 +# Size Bandwidth (MB/s) +# Datatype: MPI_CHAR. +1 2.87 +2 5.77 +4 11.55 +8 23.18 +16 46.27 +32 92.64 +64 185.21 +128 369.03 +256 743.08 +512 1487.21 +1024 2975.75 +2048 5928.14 +4096 11809.66 +8192 23097.44 +16384 31009.54 +32768 36493.20 +65536 40164.63 +131072 43150.62 +262144 45075.57 +524288 45918.07 +1048576 46313.37 +2097152 46507.25 +4194304 46609.10 +``` + +**2- Test using OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0 on 2-nodes (Grace/Hopper GPUs)**: +``` +Environment set up to use EESSI (2023.06), have fun! + +hostname: +x1000c4s4b1n0 +x1000c5s3b0n0 + +CPU info: +Vendor ID: ARM + +Currently Loaded Modules: + 1) GCCcore/13.2.0 + 2) GCC/13.2.0 + 3) numactl/2.0.16-GCCcore-13.2.0 + 4) libxml2/2.11.5-GCCcore-13.2.0 + 5) libpciaccess/0.17-GCCcore-13.2.0 + 6) hwloc/2.9.2-GCCcore-13.2.0 + 7) OpenSSL/1.1 + 8) libevent/2.1.12-GCCcore-13.2.0 + 9) UCX/1.15.0-GCCcore-13.2.0 + 10) libfabric/1.19.0-GCCcore-13.2.0 + 11) PMIx/4.2.6-GCCcore-13.2.0 + 12) UCC/1.2.0-GCCcore-13.2.0 + 13) OpenMPI/4.1.6-GCC-13.2.0 + 14) gompi/2023b + 15) GDRCopy/2.4-GCCcore-13.2.0 + 16) UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0 (g) + 17) NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0 (g) + 18) UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0 (g) + 19) OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0 (g) + + Where: + g: built for GPU + +# OSU MPI-CUDA Bi-Directional Bandwidth Test v7.5 +# Datatype: MPI_CHAR. +# Size Bandwidth (MB/s) +1 0.18 +2 0.37 +4 0.75 +8 1.49 +16 2.99 +32 5.93 +64 11.88 +128 23.76 +256 72.78 +512 145.45 +1024 282.03 +2048 535.46 +4096 1020.24 +8192 16477.70 +16384 25982.96 +32768 30728.30 +65536 37637.46 +131072 41808.92 +262144 44316.19 +524288 43693.89 +1048576 43759.66 +2097152 43593.38 +4194304 43436.60 +``` +## Conclusion + +The approach demonstrates EESSI's flexibility in accommodating specialized hardware requirements while preserving the benefits of a standardized software stack! + + From d619629e6229d56e2ce5df41863ab1120bdd6b23 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Wed, 10 Sep 2025 16:29:22 +0200 Subject: [PATCH 02/19] fix_typo --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 723e870890..c22cd28008 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -8,7 +8,7 @@ slug: EESSI-on-Cray-Slingshot High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) represents a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. -In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 CXI support on Cray systems and its integration with EESSI using host injections. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. +In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 CXI support on Cray systems and its integration with EESSI using `host_injections`. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. From d985401c3002d38414b843dcefa22918ad71d2c5 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Wed, 10 Sep 2025 16:30:36 +0200 Subject: [PATCH 03/19] fix_typo --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index c22cd28008..fb776db9a2 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -60,7 +60,7 @@ To build OpenMPI 5.x with CXI support, we needed the following missing dependenc EESSI's [host_injections](../../../../site_specific_config/host_injections.md) mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack. -*Validating `libmpi.so.40` in `host_injections` on the ARM nodes*: +**Validating `libmpi.so.40` in `host_injections` on ARM nodes**: ``` ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40 From 1c6637571279a58d8a6dd9c7c9b83e7eb3ddd047 Mon Sep 17 00:00:00 2001 From: TopRichard <121792457+TopRichard@users.noreply.github.com> Date: Wed, 10 Sep 2025 17:28:43 +0200 Subject: [PATCH 04/19] Update docs/blog/posts/2025/09/eessi-cray-slingshot11.md Co-authored-by: ocaisa --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index fb776db9a2..303d84e1f2 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -6,7 +6,7 @@ slug: EESSI-on-Cray-Slingshot # EESSI on Cray system with Slingshot 11 -High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) represents a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. +High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 CXI support on Cray systems and its integration with EESSI using `host_injections`. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. From 895117f848427fbcf5e32f80ddec00d198d40a64 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Wed, 10 Sep 2025 17:44:17 +0200 Subject: [PATCH 05/19] added latency test --- .../posts/2025/09/eessi-cray-slingshot11.md | 55 +++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 303d84e1f2..11aeff3574 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -160,6 +160,33 @@ Currently Loaded Modules: 1048576 46313.37 2097152 46507.25 4194304 46609.10 + + +# OSU MPI Latency Test v7.1 +# Size Latency (us) +# Datatype: MPI_CHAR. +1 1.66 +2 1.65 +4 1.65 +8 1.65 +16 1.65 +32 1.65 +64 1.65 +128 2.13 +256 2.20 +512 2.23 +1024 2.31 +2048 2.46 +4096 2.61 +8192 2.87 +16384 3.24 +32768 5.24 +65536 6.60 +131072 9.29 +262144 14.69 +524288 26.21 +1048576 47.32 +2097152 90.79 ``` **2- Test using OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0 on 2-nodes (Grace/Hopper GPUs)**: @@ -223,6 +250,34 @@ Currently Loaded Modules: 1048576 43759.66 2097152 43593.38 4194304 43436.60 + + +# OSU MPI-CUDA Latency Test v7.5 +# Datatype: MPI_CHAR. +# Size Avg Latency(us) +1 11.71 +2 11.66 +4 11.66 +8 11.71 +16 11.67 +32 11.68 +64 11.66 +128 12.45 +256 3.76 +512 3.82 +1024 3.91 +2048 4.08 +4096 4.25 +8192 4.49 +16384 5.09 +32768 8.02 +65536 9.56 +131072 13.52 +262144 17.96 +524288 28.94 +1048576 50.50 +2097152 93.98 +4194304 180.14 ``` ## Conclusion From 160c7bb7fa73efaaeb3cf11367428d1153707fd1 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Wed, 10 Sep 2025 23:29:58 +0200 Subject: [PATCH 06/19] added system name and OpenMPI build info --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 11aeff3574..49f3099aa1 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -25,7 +25,8 @@ The main technical challenge is building the complete dependency chain on top of ## System Architecture -Our target system consists of two distinct partitions: +Our target system [Olivia](https://documentation.sigma2.no/olivia_pilot_period_docs/olivia_pilot_main.html) is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE ClusterStor for global storage, all connected via HPE Slingshot high-speed interconnect. +It consists of two main distinct partitions: - **Partition 1**: x86_64 AMD CPUs without accelerators - **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators @@ -60,7 +61,10 @@ To build OpenMPI 5.x with CXI support, we needed the following missing dependenc EESSI's [host_injections](../../../../site_specific_config/host_injections.md) mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack. -**Validating `libmpi.so.40` in `host_injections` on ARM nodes**: +**Validating the `libmpi.so.40` in `host_injections` from OpenMPI/5.0.7 on ARM nodes built with:** +``` +./configure --prefix=/cluster/installations/eessi/default/aarch64/software/OpenMPI/5.0.7-GCC-12.3.0 --with-cuda=${EBROOTCUDA} --with-cuda-libdir=${EBROOTCUDA}/lib64 --with-slurm --enable-mpi-ext=cuda --with-libfabric=${EBROOTLIBFABRIC} --with-ucx=${EBROOTUCX} --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/hwloc/2.9.1-GCCcore-12.3.0 --with-libevent=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0 --with-pmix=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/PMIx/4.2.4-GCCcore-12.3.0 --with-ucc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCC/1.2.0-GCCcore-12.3.0 --with-prrte=internal +``` ``` ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40 From 3c176e90e46102f6a8b3ca05f28e025246dc10ce Mon Sep 17 00:00:00 2001 From: TopRichard <121792457+TopRichard@users.noreply.github.com> Date: Thu, 11 Sep 2025 08:11:06 +0200 Subject: [PATCH 07/19] Update docs/blog/posts/2025/09/eessi-cray-slingshot11.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Thomas Röblitz --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 49f3099aa1..e110fc34fb 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -4,7 +4,7 @@ date: 2025-09-10 slug: EESSI-on-Cray-Slingshot --- -# EESSI on Cray system with Slingshot 11 +# MPI at Warp Speed: EESSI Meets Slingshot-11 High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. From c34cc2fff9c2c4ed1777b09a81f01c22f7971755 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Thu, 11 Sep 2025 10:56:33 +0200 Subject: [PATCH 08/19] text modification as per suggestions --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index e110fc34fb..e6271cf26e 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -6,22 +6,22 @@ slug: EESSI-on-Cray-Slingshot # MPI at Warp Speed: EESSI Meets Slingshot-11 -High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE Cray Slingshot 11 with CXI (Cassini eXascale Interconnect) promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. +High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE/Cray supporting Slingshot 11 via CXI libfabric promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. -In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 CXI support on Cray systems and its integration with EESSI using `host_injections`. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. +In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 support on HPE/Cray systems and its integration with EESSI using `host_injections`. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. ## The Challenge -EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot 11 CXI requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: +EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot 11 requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: -1. Build OpenMPI 5.x with native Slingshot 11 CXI support -2. Create ABI-compatible replacements for EESSI's MPI libraries +1. Build OpenMPI 5.x with native Slingshot 11 support +2. Create ABI-compatible replacements for EESSI's OpenMPI libraries 3. Support both x86_64 AMD CPU partitions and NVIDIA Grace CPU partitions with Hopper accelerators 4. Avoid dependency on system packages where possible -The main technical challenge is building the complete dependency chain on top of EESSI, as many of the required libraries for CXI support don't exist in the current EESSI stack. +The main task is to build the required dependencies on top of EESSI, since many of the libraries needed for libfabric with CXI support are not yet available in the current EESSI stack. ## System Architecture @@ -45,7 +45,7 @@ Rather than relying on Cray-provided system packages, we opted to build all depe ### Required Dependencies -To build OpenMPI 5.x with CXI support, we needed the following missing dependencies: +To build OpenMPI 5.x with libfabric and CXI support, we needed the following missing dependencies: 1. **libuv** - Asynchronous I/O library 2. **libnl** - Netlink library for network configuration From 886a8a05750eb298780eba93ef76b0e21bbed567 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Thu, 11 Sep 2025 13:45:33 +0200 Subject: [PATCH 09/19] fixed typo and link --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index e6271cf26e..0839fd9566 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -6,17 +6,17 @@ slug: EESSI-on-Cray-Slingshot # MPI at Warp Speed: EESSI Meets Slingshot-11 -High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE/Cray supporting Slingshot 11 via CXI libfabric promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. +High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE/Cray supporting Slingshot-11 via CXI libfabric promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. -In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot 11 support on HPE/Cray systems and its integration with EESSI using `host_injections`. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. +In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using [host_injections](../../../../site_specific_config/host_injections.md). This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. ## The Challenge -EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot 11 requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: +EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot-11 requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: -1. Build OpenMPI 5.x with native Slingshot 11 support +1. Build OpenMPI 5.x with native Slingshot-11 support 2. Create ABI-compatible replacements for EESSI's OpenMPI libraries 3. Support both x86_64 AMD CPU partitions and NVIDIA Grace CPU partitions with Hopper accelerators 4. Avoid dependency on system packages where possible @@ -59,7 +59,7 @@ To build OpenMPI 5.x with libfabric and CXI support, we needed the following mis ## EESSI Integration via `host_injections` -EESSI's [host_injections](../../../../site_specific_config/host_injections.md) mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack. +EESSI's `host_injections` mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack. **Validating the `libmpi.so.40` in `host_injections` from OpenMPI/5.0.7 on ARM nodes built with:** ``` From adaf4150c9a10b5a1ba18dae51ec77afd669a462 Mon Sep 17 00:00:00 2001 From: ocaisa Date: Wed, 17 Sep 2025 15:42:51 +0200 Subject: [PATCH 10/19] Update eessi-cray-slingshot11.md --- .../posts/2025/09/eessi-cray-slingshot11.md | 38 ++++++++++++------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 0839fd9566..5374297bef 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -6,38 +6,45 @@ slug: EESSI-on-Cray-Slingshot # MPI at Warp Speed: EESSI Meets Slingshot-11 -High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. HPE/Cray supporting Slingshot-11 via CXI libfabric promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads. +High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. The available interconnects, and the libraries +that optimise their usage, is constantly shifting. EESSI can't constantly rebuild old software so how do we take advantage of new technological developments? -In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using [host_injections](../../../../site_specific_config/host_injections.md). This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version. The post concludes with test results validating this setup. +Speficially in this blog post we look at HPE/Cray supporting Slingshot-11 via the CXI [libfabric](https://ofiwg.github.io/libfabric/) provider. That's a lot of technical jargon, but it basically comes down to making a +software stack that can fully leverage the +capabilities of the interconnect hardware. Slingshot-11 promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads...so +this should be worth the effort! +In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md) +mechanism of EESSI to inject the custom-built OpenMPI libraries. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version which should give us optimal performance. ## The Challenge -EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot-11 requires custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: +EESSI provides a comprehensive software stack, but specialized interconnect support like Slingshot-11 can sometimes require custom-built libraries that aren't yet available in the standard EESSI distribution. Our goal is to: 1. Build OpenMPI 5.x with native Slingshot-11 support 2. Create ABI-compatible replacements for EESSI's OpenMPI libraries -3. Support both x86_64 AMD CPU partitions and NVIDIA Grace CPU partitions with Hopper accelerators -4. Avoid dependency on system packages where possible +3. Place the libraries somewhere where EESSI automatically picks them up +4. Support both x86_64 AMD CPU partitions and NVIDIA Grace CPU partitions with Hopper accelerators The main task is to build the required dependencies on top of EESSI, since many of the libraries needed for libfabric with CXI support are not yet available in the current EESSI stack. -## System Architecture +### System Architecture -Our target system [Olivia](https://documentation.sigma2.no/olivia_pilot_period_docs/olivia_pilot_main.html) is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE ClusterStor for global storage, all connected via HPE Slingshot high-speed interconnect. +Our target system is [Olivia](https://documentation.sigma2.no/olivia_pilot_period_docs/olivia_pilot_main.html) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all +connected via HPE Slingshot high-speed interconnect. It consists of two main distinct partitions: - **Partition 1**: x86_64 AMD CPUs without accelerators - **Partition 2**: NVIDIA Grace CPUs with Hopper accelerators -For the Grace/Hopper partition we needed to enable CUDA support in libfabric. +For the Grace/Hopper partition we also need to enable CUDA support in libfabric. ## Building the Dependency Chain ### Building Strategy -Rather than relying on Cray-provided system packages, we opted to build all dependencies from source on top of EESSI. This approach provides several advantages: +Rather than relying on Cray-provided system packages, we opted to build all dependencies from source [on top of EESSI](../../../../using_eessi/building_on_eessi.md). This approach provides several advantages: - **Consistency**: All libraries built with the same compiler toolchain - **Compatibility**: Ensures ABI compatibility with EESSI libraries @@ -59,7 +66,11 @@ To build OpenMPI 5.x with libfabric and CXI support, we needed the following mis ## EESSI Integration via `host_injections` -EESSI's `host_injections` mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack. +EESSI's `host_injections` mechanism allows us to override EESSI's MPI library with an ABI compatible host MPI while maintaining compatibility with the rest of the software stack. We just need to make sure that the libraries are in the right +location to be automatically picked up by the software shipped with EESSI. This location is EESSI-version specific, for `2023.06`, with the NVIDIA Grace architecture, that location is: +``` +/cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib +``` **Validating the `libmpi.so.40` in `host_injections` from OpenMPI/5.0.7 on ARM nodes built with:** ``` @@ -109,6 +120,9 @@ ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvid ### Testing +We plan to provide more comprehensive test results in the future. In this blog post we want to report that the approach works in principle, and that the EESSI stack can pick up and use the custom OpenMPI build and extract +performance from the host interconnect **without the need to rebuild any software packages**. + **1- Test using OSU-Micro-Benchmarks on 2-nodes (x86_64 AMD-CPUs)**: ``` Environment set up to use EESSI (2023.06), have fun! @@ -285,6 +299,4 @@ Currently Loaded Modules: ``` ## Conclusion -The approach demonstrates EESSI's flexibility in accommodating specialized hardware requirements while preserving the benefits of a standardized software stack! - - +The approach demonstrates EESSI's flexibility in accommodating specialized hardware requirements while preserving the benefits of a standardized software stack! There is plenty of more testing to do, but the signs at this stage are very good! From 389c10da34c9724d7a0784220481bbde0b8b8a9b Mon Sep 17 00:00:00 2001 From: ocaisa Date: Wed, 17 Sep 2025 15:51:22 +0200 Subject: [PATCH 11/19] Update eessi-cray-slingshot11.md --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 5374297bef..871bec9d73 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -9,7 +9,7 @@ slug: EESSI-on-Cray-Slingshot High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. The available interconnects, and the libraries that optimise their usage, is constantly shifting. EESSI can't constantly rebuild old software so how do we take advantage of new technological developments? -Speficially in this blog post we look at HPE/Cray supporting Slingshot-11 via the CXI [libfabric](https://ofiwg.github.io/libfabric/) provider. That's a lot of technical jargon, but it basically comes down to making a +Specifically in this blog post we look at HPE/Cray supporting Slingshot-11 via the CXI [libfabric](https://ofiwg.github.io/libfabric/) provider. That's a lot of technical jargon, but it basically comes down to making a software stack that can fully leverage the capabilities of the interconnect hardware. Slingshot-11 promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads...so this should be worth the effort! From bad82cf0540bea0503c395278d15ee889a9275e7 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Tue, 14 Oct 2025 19:27:11 +0200 Subject: [PATCH 12/19] Added Cray tests --- .../posts/2025/09/eessi-cray-slingshot11.md | 124 +++++++++++------- 1 file changed, 78 insertions(+), 46 deletions(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 871bec9d73..29d680fbed 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -72,60 +72,19 @@ location to be automatically picked up by the software shipped with EESSI. This /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib ``` -**Validating the `libmpi.so.40` in `host_injections` from OpenMPI/5.0.7 on ARM nodes built with:** +**OpenMPI/5.0.7 on ARM nodes built with:** ``` ./configure --prefix=/cluster/installations/eessi/default/aarch64/software/OpenMPI/5.0.7-GCC-12.3.0 --with-cuda=${EBROOTCUDA} --with-cuda-libdir=${EBROOTCUDA}/lib64 --with-slurm --enable-mpi-ext=cuda --with-libfabric=${EBROOTLIBFABRIC} --with-ucx=${EBROOTUCX} --enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/hwloc/2.9.1-GCCcore-12.3.0 --with-libevent=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0 --with-pmix=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/PMIx/4.2.4-GCCcore-12.3.0 --with-ucc=/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCC/1.2.0-GCCcore-12.3.0 --with-prrte=internal ``` -``` -ldd /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libmpi.so.40 - - linux-vdso.so.1 (0x0000fffcfd1d0000) - libucc.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCC/1.2.0-GCCcore-12.3.0/lib64/libucc.so.1 (0x0000fffcfce50000) - libucs.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCX/1.14.1-GCCcore-12.3.0/lib64/libucs.so.0 (0x0000fffcfcde0000) - libnuma.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/numactl/2.0.16-GCCcore-12.3.0/lib64/libnuma.so.1 (0x0000fffcfcdb0000) - libucm.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/UCX/1.14.1-GCCcore-12.3.0/lib64/libucm.so.0 (0x0000fffcfcd70000) - libopen-pal.so.80 => /cluster/installations/eessi/default/aarch64/software/OpenMPI/5.0.7-GCC-12.3.0/lib/libopen-pal.so.80 (0x0000fffcfcc40000) - libfabric.so.1 => /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/aarch64/nvidia/grace/rpath_overrides/OpenMPI/system/lib/libfabric.so.1 (0x0000fffcfca50000) - librdmacm.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/librdmacm.so.1 (0x0000fffcfca10000) - libefa.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libefa.so.1 (0x0000fffcfc9e0000) - libibverbs.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libibverbs.so.1 (0x0000fffcfc9a0000) - libcxi.so.1 => /cluster/installations/eessi/default/aarch64/software/shs-libcxi/1.7.0-GCCcore-12.3.0/lib64/libcxi.so.1 (0x0000fffcfc960000) - libcurl.so.4 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libcurl.so.4 (0x0000fffcfc8a0000) - libjson-c.so.5 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/json-c/0.16-GCCcore-12.3.0/lib64/libjson-c.so.5 (0x0000fffcfc870000) - libatomic.so.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib64/libatomic.so.1 (0x0000fffcfc840000) - libcudart.so.12 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/software/CUDA/12.1.1/lib64/libcudart.so.12 (0x0000fffcfc780000) - libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x0000fffcf97d0000) - libnvidia-ml.so.1 => /usr/lib64/libnvidia-ml.so.1 (0x0000fffcf8980000) - libnl-route-3.so.200 => /cluster/installations/eessi/default/aarch64/software/libnl/3.11.0-GCCcore-12.3.0/lib64/libnl-route-3.so.200 (0x0000fffcf88d0000) - libnl-3.so.200 => /cluster/installations/eessi/default/aarch64/software/libnl/3.11.0-GCCcore-12.3.0/lib64/libnl-3.so.200 (0x0000fffcf8890000) - libpmix.so.2 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/PMIx/4.2.4-GCCcore-12.3.0/lib64/libpmix.so.2 (0x0000fffcf8690000) - libevent_core-2.1.so.7 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0/lib64/libevent_core-2.1.so.7 (0x0000fffcf8630000) - libevent_pthreads-2.1.so.7 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libevent/2.1.12-GCCcore-12.3.0/lib64/libevent_pthreads-2.1.so.7 (0x0000fffcf8600000) - libhwloc.so.15 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/hwloc/2.9.1-GCCcore-12.3.0/lib64/libhwloc.so.15 (0x0000fffcf8580000) - libpciaccess.so.0 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libpciaccess/0.17-GCCcore-12.3.0/lib64/libpciaccess.so.0 (0x0000fffcf8550000) - libxml2.so.2 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/libxml2/2.11.4-GCCcore-12.3.0/lib64/libxml2.so.2 (0x0000fffcf83e0000) - libz.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libz.so.1 (0x0000fffcf83a0000) - liblzma.so.5 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/liblzma.so.5 (0x0000fffcf8330000) - libm.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libm.so.6 (0x0000fffcf8280000) - libc.so.6 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libc.so.6 (0x0000fffcf80e0000) - /lib/ld-linux-aarch64.so.1 (0x0000fffcfd1e0000) - libcares.so.2 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libcares.so.2 (0x0000fffcf80a0000) - libnghttp2.so.14 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/usr/lib/../lib64/libnghttp2.so.14 (0x0000fffcf8050000) - libssl.so.1.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/OpenSSL/1.1/lib64/libssl.so.1.1 (0x0000fffcf7fb0000) - libcrypto.so.1.1 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/OpenSSL/1.1/lib64/libcrypto.so.1.1 (0x0000fffcf7d10000) - libdl.so.2 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libdl.so.2 (0x0000fffcf7ce0000) - libpthread.so.0 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/libpthread.so.0 (0x0000fffcf7cb0000) - librt.so.1 => /cvmfs/software.eessi.io/versions/2023.06/compat/linux/aarch64/lib/../lib64/librt.so.1 (0x0000fffcf7c80000) -``` - ### Testing We plan to provide more comprehensive test results in the future. In this blog post we want to report that the approach works in principle, and that the EESSI stack can pick up and use the custom OpenMPI build and extract performance from the host interconnect **without the need to rebuild any software packages**. -**1- Test using OSU-Micro-Benchmarks on 2-nodes (x86_64 AMD-CPUs)**: +**1- Test using OSU-Micro-Benchmarks from EESSI on 2-nodes (x86_64 AMD-CPUs)**: ``` Environment set up to use EESSI (2023.06), have fun! + hostname: x1001c6s2b0n1 x1001c6s3b0n0 @@ -207,7 +166,7 @@ Currently Loaded Modules: 2097152 90.79 ``` -**2- Test using OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0 on 2-nodes (Grace/Hopper GPUs)**: +**2- Test using OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0 from EESSI on 2-nodes/2-GPUs (Grace/Hopper GPUs)**: ``` Environment set up to use EESSI (2023.06), have fun! @@ -297,6 +256,79 @@ Currently Loaded Modules: 2097152 93.98 4194304 180.14 ``` -## Conclusion +**3- Test using OSU-Micro-Benchmarks/7.5 with PrgEnv-cray on 2-nodes/2-GPUs (Grace/Hopper GPUs)**: +``` + +hostname: +x1000c4s4b1n0 +x1000c5s3b0n0 + +CPU info: +Vendor ID: ARM + +Currently Loaded Modules: + 1) craype-arm-grace 8) craype/2.7.34 + 2) libfabric/1.22.0 9) cray-dsmml/0.3.1 + 3) craype-network-ofi 10) cray-mpich/8.1.32 + 4) perftools-base/25.03.0 11) cray-libsci/25.03.0 + 5) xpmem/2.11.3-1.3_gdbda01a1eb3d 12) PrgEnv-cray/8.6.0 + 6) cce/19.0.0 13) cudatoolkit/24.11_12.6 + +# OSU MPI-CUDA Bi-Directional Bandwidth Test v7.5 +# Datatype: MPI_CHAR. +# Size Bandwidth (MB/s) +1 1.06 +2 2.17 +4 4.40 +8 8.80 +16 17.64 +32 35.17 +64 70.55 +128 140.91 +256 281.22 +512 559.04 +1024 1114.45 +2048 2081.25 +4096 4068.64 +8192 1852.11 +16384 18564.47 +32768 22647.40 +65536 33108.03 +131072 39553.95 +262144 43140.01 +524288 44853.40 +1048576 45761.69 +2097152 46228.10 +4194304 46470.29 + +# OSU MPI-CUDA Latency Test v7.5 +# Datatype: MPI_CHAR. +# Size Avg Latency(us) +1 2.76 +2 2.72 +4 2.90 +8 2.86 +16 2.85 +32 2.73 +64 2.60 +128 3.41 +256 4.17 +512 4.19 +1024 4.29 +2048 4.44 +4096 4.66 +8192 7.59 +16384 8.17 +32768 8.44 +65536 9.92 +131072 12.59 +262144 18.07 +524288 29.00 +1048576 50.64 +2097152 94.06 +4194304 180.44 +``` + +## Conclusion The approach demonstrates EESSI's flexibility in accommodating specialized hardware requirements while preserving the benefits of a standardized software stack! There is plenty of more testing to do, but the signs at this stage are very good! From ae602840d8264cb1d8c7962b275ac8f061734a40 Mon Sep 17 00:00:00 2001 From: TopRichard <121792457+TopRichard@users.noreply.github.com> Date: Wed, 15 Oct 2025 08:58:34 +0200 Subject: [PATCH 13/19] Update docs/blog/posts/2025/09/eessi-cray-slingshot11.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Thomas Röblitz --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 29d680fbed..0ad4eb0e56 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -6,8 +6,7 @@ slug: EESSI-on-Cray-Slingshot # MPI at Warp Speed: EESSI Meets Slingshot-11 -High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximizing application performance. The available interconnects, and the libraries -that optimise their usage, is constantly shifting. EESSI can't constantly rebuild old software so how do we take advantage of new technological developments? +High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximising application performance. However, we cannot rebuild all the software in EESSI that depends on improvements to communication libraries. So how do we take advantage of new technological developments? Specifically in this blog post we look at HPE/Cray supporting Slingshot-11 via the CXI [libfabric](https://ofiwg.github.io/libfabric/) provider. That's a lot of technical jargon, but it basically comes down to making a software stack that can fully leverage the From c500c862bd137577f8f4ec91b31fda2191ef6ff8 Mon Sep 17 00:00:00 2001 From: TopRichard <121792457+TopRichard@users.noreply.github.com> Date: Wed, 15 Oct 2025 08:59:14 +0200 Subject: [PATCH 14/19] Update docs/blog/posts/2025/09/eessi-cray-slingshot11.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Thomas Röblitz --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 0ad4eb0e56..f813374ff1 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -8,7 +8,7 @@ slug: EESSI-on-Cray-Slingshot High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximising application performance. However, we cannot rebuild all the software in EESSI that depends on improvements to communication libraries. So how do we take advantage of new technological developments? -Specifically in this blog post we look at HPE/Cray supporting Slingshot-11 via the CXI [libfabric](https://ofiwg.github.io/libfabric/) provider. That's a lot of technical jargon, but it basically comes down to making a +Specifically we look at taking benefit of the HPE/Cray Slingshot-11. software stack that can fully leverage the capabilities of the interconnect hardware. Slingshot-11 promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads...so this should be worth the effort! From c4bef6a0cb39bba26d925ceafbf9eb78eb537718 Mon Sep 17 00:00:00 2001 From: TopRichard <121792457+TopRichard@users.noreply.github.com> Date: Wed, 15 Oct 2025 08:59:42 +0200 Subject: [PATCH 15/19] Update docs/blog/posts/2025/09/eessi-cray-slingshot11.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Thomas Röblitz --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index f813374ff1..3102b82519 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -14,7 +14,7 @@ capabilities of the interconnect hardware. Slingshot-11 promises to offer a sign this should be worth the effort! In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md) -mechanism of EESSI to inject the custom-built OpenMPI libraries. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version which should give us optimal performance. +mechanism of EESSI to inject custom-built OpenMPI libraries. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version which should give us optimal performance. ## The Challenge From cff5e9775b95369a9847dfe4920e879531624e9f Mon Sep 17 00:00:00 2001 From: TopRichard <121792457+TopRichard@users.noreply.github.com> Date: Wed, 15 Oct 2025 09:00:43 +0200 Subject: [PATCH 16/19] Update docs/blog/posts/2025/09/eessi-cray-slingshot11.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Thomas Röblitz --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 3102b82519..13ee63b31c 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -9,8 +9,7 @@ slug: EESSI-on-Cray-Slingshot High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximising application performance. However, we cannot rebuild all the software in EESSI that depends on improvements to communication libraries. So how do we take advantage of new technological developments? Specifically we look at taking benefit of the HPE/Cray Slingshot-11. -software stack that can fully leverage the -capabilities of the interconnect hardware. Slingshot-11 promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads...so +Slingshot-11 promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads ... so this should be worth the effort! In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the [host_injections](../../../../site_specific_config/host_injections.md) From a29cd6f525cb68bd8f645baabcb709197bf6aea8 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Wed, 15 Oct 2025 12:23:35 +0200 Subject: [PATCH 17/19] more modifications --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 13ee63b31c..8ebcc99b6d 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -29,7 +29,7 @@ The main task is to build the required dependencies on top of EESSI, since many ### System Architecture -Our target system is [Olivia](https://documentation.sigma2.no/olivia_pilot_period_docs/olivia_pilot_main.html) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all +Our target system is [Olivia](https://https://documentation.sigma2.no/hpc_machines/olivia.html#olivia) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all connected via HPE Slingshot high-speed interconnect. It consists of two main distinct partitions: @@ -162,6 +162,7 @@ Currently Loaded Modules: 524288 26.21 1048576 47.32 2097152 90.79 +4194304 182.30 ``` **2- Test using OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0 from EESSI on 2-nodes/2-GPUs (Grace/Hopper GPUs)**: From 3a26e29e00ae16a16ef251b442f3d1f0d94148f4 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Wed, 15 Oct 2025 12:26:26 +0200 Subject: [PATCH 18/19] fixed typo --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index 8ebcc99b6d..e6c96aff62 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -29,7 +29,7 @@ The main task is to build the required dependencies on top of EESSI, since many ### System Architecture -Our target system is [Olivia](https://https://documentation.sigma2.no/hpc_machines/olivia.html#olivia) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all +Our target system is [Olivia](https://documentation.sigma2.no/hpc_machines/olivia.html#olivia) which is based on HPE Cray EX platforms for compute and accelerator nodes, and HPE Cray ClusterStor for global storage, all connected via HPE Slingshot high-speed interconnect. It consists of two main distinct partitions: From 285375a7892932ae9e79434bfa975dfbffe092e3 Mon Sep 17 00:00:00 2001 From: Richard Top Date: Thu, 16 Oct 2025 20:38:17 +0200 Subject: [PATCH 19/19] Added charts --- docs/blog/posts/2025/09/eessi-cray-slingshot11.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md index e6c96aff62..db7bf6567e 100644 --- a/docs/blog/posts/2025/09/eessi-cray-slingshot11.md +++ b/docs/blog/posts/2025/09/eessi-cray-slingshot11.md @@ -329,5 +329,7 @@ Currently Loaded Modules: 4194304 180.44 ``` +![OSU CUDA Bandwidth](osu_cuda_bandwidth.png) ![OSU CUDA Latency](osu_cuda_latency.png) + ## Conclusion The approach demonstrates EESSI's flexibility in accommodating specialized hardware requirements while preserving the benefits of a standardized software stack! There is plenty of more testing to do, but the signs at this stage are very good!