Skip to content

Improve hardware feature detection in CMake files and do further refactoring & enhancement #824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

mhucka
Copy link
Collaborator

@mhucka mhucka commented Jun 29, 2025

Changes:

  • Previously, the CMake configuration was such that it would always build the AVX2, AVX512 & SSE2 Pybind11 modules without testing whether the current system supported those options. The changes here use CMake features to detect whether the architectural features are in fact available on the host computer, and only attempt to build the appropriate modules. The code for testing this is in a new CheckCPU.cmake file.

  • Previously, each of 8 or so pybind_interface/ subdirectories had CMakeLists.txt files that contained the same text for setting certain compilation flags. This PR removes the duplication in favor of putting the settings into the top-level CMakeLists.txt file.

  • On MacOS, this adds more include and link directories (specifically those commonly used with Homebrew) for finding the LLVM and the OMP include files and libraries.

  • Finally, this PR includes a bit of refactoring:

    • The introduction of a new directory, dev_tools/cmake/, for placing .cmake files included by other files. It moves pybind_interface/GetPybind11.cmake into this new location alongside the new CheckCPU.cmake file.
    • The reorganization of much of the top-level CMakeLists.txt file, to (hopefully) organize some of the code more logically.
    • The addition of more message() statements to help understand what's happening.

mhucka added 7 commits June 29, 2025 22:52
Previously, the CMake configuration was such that it would always
build the AVX2, AVX512 & SSE2 Pybind11 modules without testing whether
the current system supported those options. The changes here use CMake
features to detect whether the architectural features are in fact
available, and only attempt to build the appropriate modules.

In addition, there is a minor bit of refactoring to group some of the
code more logically, and to add some more informational message
printouts.
Each of 8 or so `pybind_interface/` subdirectories had
`CMakeLists.txt` files that contained the same text for setting
certain compilation flags. This commit removes the duplication in
favor of putting the settings into the top-level CMake file.
On Ubuntu, one sees warnings like this:

```
lto-wrapper: warning: using serial compilation of 13 LTRANS jobs
lto-wrapper: note: see the ‘-flto’ option documentation for more
information
lto-wrapper: warning: using serial compilation of 15 LTRANS jobs
lto-wrapper: note: see the ‘-flto’ option documentation for more
information
lto-wrapper: warning: using serial compilation of 16 LTRANS jobs
lto-wrapper: note: see the ‘-flto’ option documentation for more
information
lto-wrapper: warning: using serial compilation of 16 LTRANS jobs
lto-wrapper: note: see the ‘-flto’ option documentation for more
information
```

This seems to be the default behavior if the option `-flto` is not
given a value (c.f. https://stackoverflow.com/a/72222512/28972686).
Giving it a value of "auto" makes the warning go away and lets the
compilation toolchain decide how much parallism it can use.
@mhucka mhucka changed the title Improve hardware feature detection & conslidate duplicated settings in CMake files Improve hardware feature detection & consolidate duplicated settings in CMake files Jun 30, 2025
mhucka added 2 commits June 30, 2025 03:47
This can enable additional optimizations without requiring specific
avx/sse/etc. instructions.
@mhucka mhucka force-pushed the mh-consolidate-cmake-configs branch from 1b753ba to 3941a3f Compare June 30, 2025 03:51
@github-actions github-actions bot added the size: L 250< lines changed <1000 label Jul 1, 2025
The common wisdom is wrong, apparently: `-march=native` does not work
for AVX on MacOS.
@mhucka mhucka force-pushed the mh-consolidate-cmake-configs branch from 1756d01 to 20c9dd8 Compare July 1, 2025 03:37
@mhucka mhucka self-assigned this Jul 1, 2025
CMakeLists.txt Outdated
check_cxx_compiler_flag("/arch:AVX512" HAVE_AVX512)
else()
check_cxx_compiler_flag("-mavx2" HAVE_AVX2)
check_cxx_compiler_flag("-mavx512f" HAVE_AVX512)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really a good way to detect whether the architectural features are in fact available on the host computer? check_cxx_compiler_flag checks whether the C++ compiler supports a given flag. It seems the g++ compiler supports the -mavx512f flag even if the host CPU does not support AVX512 instructions.

mhucka added 5 commits July 1, 2025 21:24
Getting correct info at run-time about SSE support on Windows has been
a bit difficult. To help get the best performance, if we know SSE is
supposed to be supported, we can try to use multiple known
architecture flags starting with those introduced most recently.
The previous code wrongly assumed that testing for compiler support
would imply the underlying hardware supported the features. This is
wrong (and rather obviously wrong -- I don't know what I was thinking
before).

The new code tries in various ways to get info about the architectural
features of the host's CPU.
@mhucka mhucka changed the title Improve hardware feature detection & consolidate duplicated settings in CMake files Improve hardware feature detection in CMake files & do refactoring Jul 6, 2025
@mhucka mhucka changed the title Improve hardware feature detection in CMake files & do refactoring Improve hardware feature detection in CMake files and do further refactoring & enhancement Jul 6, 2025
We have at least two custom CMake files now, and may have more in the
future. The files only need to be included from the top-level
`CMakeLists.txt` file; they don't actually need to be included from
the `CMakeLists.txt` files in the `pybind_interface/*` subdirectories
because those subdirectories are added to the top-level
`CMakeLists.txt` file. For better organization and maintaince, I think
it makes sense to move `GetPybind11.cmake` and other *.cmake files
into a dedicated subdirectory, and a location inside `dev_tools/`
seems to make the most sense.
mhucka added 12 commits July 7, 2025 03:46
This updates the CMake code and rewrites the logic in several ways:

* Test if pybind11 has already been found, and skip the work if so.

* Use `find_package(pybind11)` to look for the Pybind11 package first,
  with a hint about where it might be located in the Python
  installation directories.

* If the previous approach didn't find pybind11, only then try to get
  it from the GitHub repository.

* Add the Pybind11 header files directory to the Python include files
  directories list, so that the CMake configuration files for the qsim
  modules can find them without having to add them explicitly in the
  module's CMakeLists.txt files.
Depending on the host platform, this uses different methods to try to
determine whether the CPU supports the SIMD and vector instruction
sets used by qsim (different flavors of AVX and SSE).
This file turned out to do so little that it's not worth maintaining a
separate file for it.
Due to the fact that the top-level `CMakeLists.txt` file includes
these files using `add_directory`, it's not necessary for them to
do things like include `GetPybind.cmake` or call `project()`. Those
directives are already done by the top-level file. We can simplify
these sub-`CMakeLists.txt` files.
Due to the fact that the top-level `CMakeLists.txt` file includes
these files using `add_directory`, it's not necessary for them to do
things like include `GetPybind.cmake` or call `project()`. Those
directives are already done by the top-level file. We can simplify
these sub-`CMakeLists.txt` files.

In addition, in this file in particular, it's possible to pull out
`target_link_libraries()` as a common element because it's done for
all the cases tested.

Finally, instead of including the `GetCUDAARCHS.cmake` I had created
before, let's just include the 3 lines directly in here. It makes it
more obvious what's happening and reduces maintenance burden.
Due to the fact that the top-level `CMakeLists.txt` file includes
these files using `add_directory`, it's not necessary for them to do
things like include `GetPybind.cmake` or call `project()`. Those
directives are already done by the top-level file. We can simplify
these sub-`CMakeLists.txt` files.

In addition, this commit rewrites the logic for determining which flag
to pass to the compiler on Windows. The Windows case is complicated
due to changes in the flags in MSVC over the years. The new logic
tries to be robust in the face of different possibilities.
Due to the fact that the top-level `CMakeLists.txt` file includes
these files using `add_directory`, it's not necessary for them to do
things like include `GetPybind.cmake` or call `project()`. Those
directives are already done by the top-level file. We can simplify
these sub-`CMakeLists.txt` files.

In addition, the logic for setting `CMAKE_CUDA_ARCHITECTURES` can be
made very simple since we don't do complicated things to try to figure
out which CUDA/GPU variants the host has. If the `CUDAARCHS`
environment variable is not set and `CMAKE_CUDA_ARCHITECTURES` is not
set either, we just default to using `native` and hope for the best.
This is another round of rewriting the top-level file.

* Moves the `check_cpu_supports()` macro to a separate file.

* Improves testing for CPU features for vectorizing.

* Uses more modern ways of doing things like looking for `hipcc`.

* Renames variables to make them a little bit more obvious.
It seems to figure out automatically how many threads to use for the builds.
@mhucka mhucka marked this pull request as draft July 7, 2025 17:37
mhucka added 4 commits July 7, 2025 22:44
This makes the CMake build files test for OpenMP and use it if it's
available, but not fail if it's not.
CMake has features to handle LTO for different compilers and
platforms. Better to use this than to add our own lto options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: L 250< lines changed <1000
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants