Add GPU & large ARM runners, and setup-palace-ci action #578

Sbozzolo · 2025-12-05T21:56:50Z

This commit runs the spack workflow on larger/GPU-accelerated runners.

Updated description: #578 (comment)

OLD description, new description in #578 (comment)

I added a full test matrix, but I could not get everything to work (despite fixing several bugs). Some of this work conflicts with #550, so I will address more of that there. I tried quick fixes, but I could not get the builds to work, so I am leaving this for future work.

To be noted:

libxsmm does not support flang, so I am using clang+gfortran flang is not supported libxsmm/libxsmm#996
I ran into issues with waveports at high core count (see also Regression test failure with high core count #565)
There's one case that compiles but fails in the cylinder/floquet (periodic) test (maybe OpenMP)
The current cache has a problem with the prefix being too short. I temporarily switched to a new cache, but we should just reset the old one (palace-develop)
CPW (lumped ports, adaptive) takes 12 minutes on GPU. I don't know if this is another instance of Improve GPU performance for adaptive sweep wave ports #375
I added a new local action to set the worker up

There is this annoying bug with oneAPI and spack where the version of ifx is incorrectly reported, breaking everything. I asked about this on slack, and I was told that I should manually merge the two compiler entries in the package definition (as a workaround). I added a step to take care of this, but I hope we can remove it soon.

Good news that (after a week of fixing bugs), lots of variants are working, including OpenMP cases, cases with mixed compilers, GPU+LLVM, newer dependencies (e.g., eigen 5), et cetera.

The progression I see is:
(#576 at any time)

Sbozzolo · 2025-12-05T21:58:50Z

test/unit/test-libceed.cpp

 }

-TEST_CASE("2D libCEED Interpolators", "[libCEED][Interpolator][Serial][Parallel][GPU]")
+TEST_CASE("2D libCEED Interpolators", "[libCEED][Interpolator][Serial][Parallel]")


Removed because of #516

Sbozzolo · 2025-12-05T21:59:52Z

test/examples/runtests.jl

 end

-abstol = 2.0e-12
+abstol = 1.0e-11


I don't remember which case led me to change this, but I found it necessary

Sbozzolo · 2025-12-05T22:24:58Z

spack_repo/local/packages/palace/package.py

+            strumpack_packages = ["ParMETIS", "METIS", "LAPACK", "BLAS", "MPI", "MPI_Fortran"]
+            if self.spec.satisfies("+openmp"):
+               strumpack_packages.append("OpenMP")
+            args.append(self.define("STRUMPACK_REQUIRED_PACKAGES", ";".join(strumpack_packages)))
+            scalapack_libs = self.spec["scalapack"].libs
+            fortran_libs = ""
+            if "gfortran" in self.compiler.fc:
+                fortran_libs = "gfortran"
+            elif "ifort" in self.compiler.fc or "ifx" in self.compiler.fc:
+                fortran_libs = "ifport;ifcore"
+            # For other compilers (flang, etc.), don't add extra libs
+            strumpack_libs = str(scalapack_libs).replace(" ", ";")
+            if fortran_libs:
+                strumpack_libs += ";" + fortran_libs


I think this entire section might away with #550

hughcars

One small question about usage of the cache, but generally LGTM.

.github/workflows/spack.yml

spack_repo/patches/pr2580.patch

.github/actions/setup-palace-ci/action.yml

Sbozzolo · 2025-12-29T16:39:39Z

I updated the PR with the following changes:

The PR is now meant to be merged after Update hypre to 3.0.0+ to fix compatibility with CUDA 13 #498. To facilitate review, I changed the base branch to gbozzola/hypre3
The PR fixes more spack compilation issues, mostly due to arguments that should be passed to ExternalMFEM but were not passed. It it is likely that I missed something in that, but I think that all of these will become obsolete after Compile MFEM externally #550 (because we won't need ExternalMFEM anymore).
I found that I introduced a new bug in the libCEED spack package, I fixed this upstream (libceed: Fix libceed +cuda, +openmp, +shared for > 0.12 spack/spack-packages#2892), and added a patch so that it is also fixed here

More importantly, I changed from a allpairs type of testing to a more curated test matrix. allpairs was generating combinations that were valid but unnatural. Instead, here I test:

x64, with GCC/LLVM and OpenBlas, and with the Intel suite
arm64, with GCC/LLVM and OpenBlas, and with GCC and ARMPL
gpu with GCC
a combination where I turn unusual options on (static builds, openmp, int64)

For some builds, CI is now broken down in two steps: build (e.g., build-x64-gcc) and test (test-x64-gcc). The test build is a matrix run that re-uses the build artifacts generated in the build step and is run on several smaller runners (since we cannot take advantage of large runners in our regression suite, because of the small size of the problem). The build step compiles Palace with everything turned on (so that we can reuse the artifact). Currently, the test matrix is parametrized on the linear solver, but we can easily extend that.

To allow this workflow, I extended the runtest script to take command line arguments and overwrite the config files with the passed arguments. Environment variables are still accepted (but get overwritten if the corresponding command line argument is passed). I updated the documentation for this.

I also reorganized the workflows so that they are in repo-local actions. One of the undersired side effects of this is that the GitHub action log is not as neatly broken down in steps that show the individual time (which is very useful to optimize the runtime). I checked if there was a way around this, and could not find one. In the future, I will add back timing information.

I also tried adding macOS in the matrix, but found lots of problems:

Spack seems incapable to correctly handle armpl (partially because Spack's armpl has a constraint on gcc for some reason). Adding armpl in the mix prevents spack from concretizing
Python has to be compiled with apple-clang and cannot be compiled with gcc
OpenBlas 0.3.30 does not compile of Apple silicon
Strumpack forces OpenBlas with OpenMP, but apple-clang does not support OpenMP
Even when mostly using GCC, Spack often cannot concretize the environment

This commit runs the spack workflow on larger/GPU-accelerated runners. I added a full test matrix, but I could not get everything to work. In particular, the static builds have several linking errors, which indicates some issue in the build system. I tried quick fixes, but I could not get the builds to work, so I am leaving this for future work.

At the end, we pay the cost of compiling them only once (with the buildcache)

hughcars

One small question about how to disable failing tests in future if using matrices, i.e. one entry in a 3x3 matrix for example. But generally this looks good to me.

.github/actions/palace-ci/action.yml

hughcars · 2026-01-07T17:09:01Z

.github/actions/setup-runner/action.yml

+        # Intel
+        echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
+        wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
+        | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null


Can these be placed within the Setup actions straight after? Or is there value in having this broken out? I imagine they're fairly quick so the granularity is probably helpful even if it adds them on runs without these enabled.

I added this separately to have only one apt update. apt update pull from the internet and takes O(10) seconds. I moved it out for etiquette (I don't want to make too many server calls if I can avoid that) and shave off that tiny amount of time. This is a micro-optimization and I can move this to the steps below if you think that would be better.

hughcars · 2026-01-07T17:10:13Z

.github/actions/setup-runner/action.yml

+    - name: Install Spack
+      uses: actions/checkout@v6
+      with:
+        repository: spack/spack
+        path: spack
+        ref: ${{ inputs.spack-version }}
+
+    - name: Setup Spack environment
+      shell: bash
+      run: |
+        echo "$(realpath spack/bin)" >> "$GITHUB_PATH"
+        echo "SPACK_DISABLE_LOCAL_CONFIG=1" >> "$GITHUB_ENV"


Is this done for the cmake builds too? Again, probably not that big a deal as I bet it's very fast.

Currently, this file is not used for the cmake builds. We might need a couple of small tweaks for that.

hughcars · 2026-01-07T17:13:04Z

.github/workflows/spack.yml

+  test-x64-gcc:
+    needs: [filter, build-x64-gcc]
+    if: success()
+    strategy:
+      fail-fast: false
+      matrix:
+        solver: [MUMPS, SuperLU, STRUMPACK, Default]
+    runs-on: [self-hosted, x64, 4xlarge]
+    steps:
+      - uses: actions/checkout@v6
+      - uses: ./.github/actions/run-regression-tests
+        with:
+          toolchain: gcc
+          solver: ${{ matrix.solver }}


With this matrix approach, if we have a failing test for one entry in the tensor product that we want to disable, how would we do that? I.e. only the STRUMPACK run on 4xlarge.

The action defines three separate build-xxxxxx jobs and three corresponding test-xxxxxx jobs. The test jobs define a matrix (all of which running on 4xlarge). If you want to disable a specific combination, you can simply comment that out. For example, if you want to disable STRUMPACK for openmp, we would just have

test-x64-gcc-openmp: needs: [filter, build-x64-gcc-openmp] if: success() strategy: fail-fast: false matrix: include: - solver: MUMPS - solver: SuperLU # - solver: STRUMPACK - solver: Default runs-on: [self-hosted, x64, 4xlarge] steps: - uses: actions/checkout@v6 - uses: ./.github/actions/run-regression-tests with: toolchain: gcc variant: "+openmp+int64~shared" solver: ${{ matrix.solver }}

hughcars · 2026-01-07T17:14:33Z

palace/CMakeLists.txt

+# Find Umpire and Camp (needed when CUDA/HIP is enabled)
+if(PALACE_WITH_CUDA OR PALACE_WITH_HIP)
+  find_package(umpire REQUIRED CONFIG)
+  find_package(camp REQUIRED CONFIG)


What does camp do? I've not heard of it before.

To be honest, I don't quite know. It is used by umpire: https://camp.readthedocs.io/en/latest/sphinx/user_guide/using_camp.html#camp-used-in-umpire

hughcars · 2026-01-07T17:17:19Z

spack_repo/local/packages/palace/package.py

            cuda_variant = f"+cuda cuda_arch={arch}"
+
+            # TODO: Remove me after blt > 0.7.1 is released
+            depends_on(f"blt@develop")


Can this be modeled as a conflict for 0.7.1 and below? I.e. I think that will always pull develop if the conflict covers the most recent version, and then this will not require updating again (I think we're doing this for libceed for instance, depending on a version that doesn't exist yet).

Yes, we can just say depends_on("[email protected]:").

hughcars · 2026-01-07T17:19:39Z

spack_repo/local/packages/palace/package.py

        depends_on("libxsmm+debug", when="build_type=Debug")
        depends_on("libceed+libxsmm", when="@0.14:")
        # NOTE: libxsmm builds on MacOS have linker issues
        # https://github.com/libxsmm/libxsmm/issues/883


Once the 0.15.0 spack recipe PR is in, I think you should open another one with these changes in addition. This seems like a really good upgrade of the general capability and robustness of the spack release.

We change other stuff in #550, but yes, I agree we should update the spack recipe.

hughcars · 2026-01-07T17:24:09Z

docs/src/developer/testing.md

+#### Command Line Arguments

-  - `PALACE_TEST`: Path to *Palace* executable and optional arguments (default: "`palace`")
-  - `NUM_PROC_TEST`: Number of MPI processes (default: number of physical cores)
-  - `OMP_NUM_THREADS`: Number of OpenMP threads (default: 1)
-  - `TEST_CASES`: Space-separated list of test cases to run (default: all examples)
+The test runner supports command line arguments for configuration. Each argument can also be set via environment variables as fallbacks.
+
+**Key Options:**
+
+  - `--palace-test`: Path to *Palace* executable and optional arguments (default: "`palace`")
+  - `--num-proc-test`: Number of MPI processes (default: number of physical cores)
+  - `--test-cases`: Space-separated list of test cases to run (default: all examples)
+
+Run `julia --project runtests.jl --help` to see all available options with descriptions and defaults.


A much nicer approach

Sbozzolo added build Related to building ci Related to continuous integration (CI) and/or GitHub Actions GPU Related to GPU support spack labels Dec 5, 2025

Sbozzolo force-pushed the gbozzola/even_larger_runners branch from f5c241d to c48c530 Compare December 5, 2025 21:58

Sbozzolo commented Dec 5, 2025

View reviewed changes

Sbozzolo force-pushed the gbozzola/even_larger_runners branch from c48c530 to 6c7fe66 Compare December 5, 2025 22:01

Sbozzolo commented Dec 5, 2025

View reviewed changes

Sbozzolo requested a review from hughcars December 8, 2025 22:32

hughcars reviewed Dec 9, 2025

View reviewed changes

.github/workflows/spack.yml Outdated Show resolved Hide resolved

.github/workflows/spack.yml Outdated Show resolved Hide resolved

spack_repo/patches/pr2580.patch Outdated Show resolved Hide resolved

Sbozzolo force-pushed the gbozzola/even_larger_runners branch from 6c7fe66 to f724e1b Compare December 9, 2025 19:42

hughcars reviewed Dec 9, 2025

View reviewed changes

.github/actions/setup-palace-ci/action.yml Outdated Show resolved Hide resolved

Sbozzolo force-pushed the gbozzola/even_larger_runners branch from f724e1b to 656cce1 Compare December 9, 2025 21:48

Sbozzolo changed the base branch from main to gbozzola/hypre3 December 29, 2025 16:16

Sbozzolo force-pushed the gbozzola/even_larger_runners branch from a0ff316 to 9330759 Compare December 29, 2025 16:23

Sbozzolo added 4 commits December 29, 2025 08:43

Remove externals from spack

7c52875

At the end, we pay the cost of compiling them only once (with the buildcache)

Always install gcc and g++

c213024

Update testing infrastructure

65dcf6f

Sbozzolo force-pushed the gbozzola/even_larger_runners branch from 9330759 to 65dcf6f Compare December 29, 2025 16:44

Sbozzolo added 2 commits December 29, 2025 10:23

Add comments

a550486

Optmize run-regression tests

8b3b6ed

Sbozzolo requested a review from hughcars December 29, 2025 20:09

Sbozzolo mentioned this pull request Dec 30, 2025

Compile MFEM externally #550

Open

Sbozzolo requested a review from cameronrutherford December 30, 2025 23:12

hughcars reviewed Jan 7, 2026

View reviewed changes

Sbozzolo mentioned this pull request Jan 8, 2026

Remove workaround for intel oneAPI in palace-ci #592

Open

Make matrix more explicit

4acbee9

Sbozzolo added 2 commits January 8, 2026 12:38

Change dependency on blt

924748c

Removed merged spack-packages patch

292255c

Add GPU & large ARM runners, and setup-palace-ci action #578

Are you sure you want to change the base?

Add GPU & large ARM runners, and setup-palace-ci action #578

Uh oh!

Conversation

Sbozzolo commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OLD description, new description in #578 (comment)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hughcars left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sbozzolo commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hughcars left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sbozzolo Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sbozzolo commented Dec 5, 2025 •

edited

Loading

Sbozzolo commented Dec 29, 2025 •

edited

Loading

Sbozzolo Jan 8, 2026 •

edited

Loading