-
Notifications
You must be signed in to change notification settings - Fork 181
Improve hardware feature detection in CMake files and do further refactoring & enhancement #824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Previously, the CMake configuration was such that it would always build the AVX2, AVX512 & SSE2 Pybind11 modules without testing whether the current system supported those options. The changes here use CMake features to detect whether the architectural features are in fact available, and only attempt to build the appropriate modules. In addition, there is a minor bit of refactoring to group some of the code more logically, and to add some more informational message printouts.
Each of 8 or so `pybind_interface/` subdirectories had `CMakeLists.txt` files that contained the same text for setting certain compilation flags. This commit removes the duplication in favor of putting the settings into the top-level CMake file.
On Ubuntu, one sees warnings like this: ``` lto-wrapper: warning: using serial compilation of 13 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information lto-wrapper: warning: using serial compilation of 15 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information lto-wrapper: warning: using serial compilation of 16 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information lto-wrapper: warning: using serial compilation of 16 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information ``` This seems to be the default behavior if the option `-flto` is not given a value (c.f. https://stackoverflow.com/a/72222512/28972686). Giving it a value of "auto" makes the warning go away and lets the compilation toolchain decide how much parallism it can use.
This can enable additional optimizations without requiring specific avx/sse/etc. instructions.
1b753ba
to
3941a3f
Compare
The common wisdom is wrong, apparently: `-march=native` does not work for AVX on MacOS.
1756d01
to
20c9dd8
Compare
CMakeLists.txt
Outdated
check_cxx_compiler_flag("/arch:AVX512" HAVE_AVX512) | ||
else() | ||
check_cxx_compiler_flag("-mavx2" HAVE_AVX2) | ||
check_cxx_compiler_flag("-mavx512f" HAVE_AVX512) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really a good way to detect whether the architectural features are in fact available on the host computer? check_cxx_compiler_flag
checks whether the C++ compiler supports a given flag. It seems the g++ compiler supports the -mavx512f
flag even if the host CPU does not support AVX512 instructions.
Getting correct info at run-time about SSE support on Windows has been a bit difficult. To help get the best performance, if we know SSE is supposed to be supported, we can try to use multiple known architecture flags starting with those introduced most recently.
The previous code wrongly assumed that testing for compiler support would imply the underlying hardware supported the features. This is wrong (and rather obviously wrong -- I don't know what I was thinking before). The new code tries in various ways to get info about the architectural features of the host's CPU.
We have at least two custom CMake files now, and may have more in the future. The files only need to be included from the top-level `CMakeLists.txt` file; they don't actually need to be included from the `CMakeLists.txt` files in the `pybind_interface/*` subdirectories because those subdirectories are added to the top-level `CMakeLists.txt` file. For better organization and maintaince, I think it makes sense to move `GetPybind11.cmake` and other *.cmake files into a dedicated subdirectory, and a location inside `dev_tools/` seems to make the most sense.
This updates the CMake code and rewrites the logic in several ways: * Test if pybind11 has already been found, and skip the work if so. * Use `find_package(pybind11)` to look for the Pybind11 package first, with a hint about where it might be located in the Python installation directories. * If the previous approach didn't find pybind11, only then try to get it from the GitHub repository. * Add the Pybind11 header files directory to the Python include files directories list, so that the CMake configuration files for the qsim modules can find them without having to add them explicitly in the module's CMakeLists.txt files.
Depending on the host platform, this uses different methods to try to determine whether the CPU supports the SIMD and vector instruction sets used by qsim (different flavors of AVX and SSE).
This file turned out to do so little that it's not worth maintaining a separate file for it.
Due to the fact that the top-level `CMakeLists.txt` file includes these files using `add_directory`, it's not necessary for them to do things like include `GetPybind.cmake` or call `project()`. Those directives are already done by the top-level file. We can simplify these sub-`CMakeLists.txt` files.
Due to the fact that the top-level `CMakeLists.txt` file includes these files using `add_directory`, it's not necessary for them to do things like include `GetPybind.cmake` or call `project()`. Those directives are already done by the top-level file. We can simplify these sub-`CMakeLists.txt` files. In addition, in this file in particular, it's possible to pull out `target_link_libraries()` as a common element because it's done for all the cases tested. Finally, instead of including the `GetCUDAARCHS.cmake` I had created before, let's just include the 3 lines directly in here. It makes it more obvious what's happening and reduces maintenance burden.
Due to the fact that the top-level `CMakeLists.txt` file includes these files using `add_directory`, it's not necessary for them to do things like include `GetPybind.cmake` or call `project()`. Those directives are already done by the top-level file. We can simplify these sub-`CMakeLists.txt` files. In addition, this commit rewrites the logic for determining which flag to pass to the compiler on Windows. The Windows case is complicated due to changes in the flags in MSVC over the years. The new logic tries to be robust in the face of different possibilities.
Due to the fact that the top-level `CMakeLists.txt` file includes these files using `add_directory`, it's not necessary for them to do things like include `GetPybind.cmake` or call `project()`. Those directives are already done by the top-level file. We can simplify these sub-`CMakeLists.txt` files. In addition, the logic for setting `CMAKE_CUDA_ARCHITECTURES` can be made very simple since we don't do complicated things to try to figure out which CUDA/GPU variants the host has. If the `CUDAARCHS` environment variable is not set and `CMAKE_CUDA_ARCHITECTURES` is not set either, we just default to using `native` and hope for the best.
This is another round of rewriting the top-level file. * Moves the `check_cpu_supports()` macro to a separate file. * Improves testing for CPU features for vectorizing. * Uses more modern ways of doing things like looking for `hipcc`. * Renames variables to make them a little bit more obvious.
It seems to figure out automatically how many threads to use for the builds.
This makes the CMake build files test for OpenMP and use it if it's available, but not fail if it's not.
CMake has features to handle LTO for different compilers and platforms. Better to use this than to add our own lto options.
Changes:
Previously, the CMake configuration was such that it would always build the AVX2, AVX512 & SSE2 Pybind11 modules without testing whether the current system supported those options. The changes here use CMake features to detect whether the architectural features are in fact available on the host computer, and only attempt to build the appropriate modules. The code for testing this is in a new
CheckCPU.cmake
file.Previously, each of 8 or so
pybind_interface/
subdirectories hadCMakeLists.txt
files that contained the same text for setting certain compilation flags. This PR removes the duplication in favor of putting the settings into the top-levelCMakeLists.txt
file.On MacOS, this adds more include and link directories (specifically those commonly used with Homebrew) for finding the LLVM and the OMP include files and libraries.
Finally, this PR includes a bit of refactoring:
dev_tools/cmake/
, for placing.cmake
files included by other files. It movespybind_interface/GetPybind11.cmake
into this new location alongside the newCheckCPU.cmake
file.CMakeLists.txt
file, to (hopefully) organize some of the code more logically.message()
statements to help understand what's happening.