Windows build guide, optimization flags and verbose exposure script#146
Windows build guide, optimization flags and verbose exposure script#146Skyrion9 wants to merge 7 commits intoju1ce:masterfrom
Conversation
…n Windows systems with MSVC.
Terminal compile flags will still override them.
…issues and solutions.
|
Windows and Linux LTO is set here: |
| set(vcpkg_bootstrap_cmd "${VCPKG_ROOT}/bootstrap-vcpkg.sh") | ||
| endif() | ||
|
|
||
| if (NOT EXISTS "${vcpkg_bootstrap_cmd}") |
There was a problem hiding this comment.
This will only run if vcpkg was not previously installed, as the bootstrap script won't exist.
There was a problem hiding this comment.
Bootstrap script is no longer shipped with VS installations and requires running a terminal command to run in "classic" mode.
Without this check, the program compiles successfully.
|
|
||
| if (NOT EXISTS "${vcpkg_bootstrap_cmd}") | ||
| find_program(GIT_CMD git REQUIRED) | ||
| execute_process(COMMAND "${GIT_CMD}" clone --filter=tree:0 "https://github.com/microsoft/vcpkg.git" "${VCPKG_ROOT}") |
There was a problem hiding this comment.
I'm assuming the git error is due to --filter=tree:0? It requires a newish version of git.
There was a problem hiding this comment.
I've the latest 2.42 maintained version of git, updating it was the first thing I've tried as a fix. What version is necessary exactly?
| # MSVC flags for runtime performance over size, and suggest more aggressive inline functioning to compiler. /favor:AMD64 or INTEL64 may benefit. | ||
| # Eigen3 note: /arch:AVX causes crash with Refine tracker calibration, due to Eigen3. 16-byte memory alignment might fix, possibly worse performance than default SSE2. | ||
| # You can set /arch to AVX, AVX2 or AVX512 if you don't use calibration refinement. This should increase vectorized loop performance noticably. | ||
| set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /O2 /Ob3 /GA /GL /Qpar /Qpar-report:1 /Qvec-report:1" PARENT_SCOPE) |
There was a problem hiding this comment.
/O2 /Ob3 /GA are already set in this variable, /GL will be set later with the INTERPROCEDURAL_OPTIMIZATION property.
/Qpar is new and interesting, we might need to check msvc version to enable that?
There was a problem hiding this comment.
I believe the default is /Ob2 which is included with /O2. /Ob3 overrides this and suggests more aggressive inlining to the compiler.
I'm not sure if /GL was already enabled, as before it was set here the compiler didn't recommend enabling LTCG.
|
|
||
| if (MSVC) | ||
| set(comp_name "MSVC") | ||
| # Note that any flag specified in build command will overwrite these. Treat them as safe defaults. |
There was a problem hiding this comment.
All of these optimization options should be in their own function, and use target_compile_options() and target_link_options().
| # You can set /arch to AVX, AVX2 or AVX512 if you don't use calibration refinement. This should increase vectorized loop performance noticably. | ||
| set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /O2 /Ob3 /GA /GL /Qpar /Qpar-report:1 /Qvec-report:1" PARENT_SCOPE) | ||
| # Additional linker optimizations as compiler output recommends /LTCG with /GL | ||
| set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} /OPT:REF /OPT:ICF=6 /OPT:LBR /LTCG" PARENT_SCOPE) |
There was a problem hiding this comment.
Again these should be removed, they are the defaults.
Except for /OPT:ICF=6 which defaults to 1, (this shouldn't be set for debug builds as it can negatively effect debugging), 6 seems high though? I wonder how it effects compile times.
There was a problem hiding this comment.
It didn't change the compile time much, and I doubt higher than 3 does anything but the time difference was minor so I went with 6.
As for LTCG, I'm not sure if it was enabled by default. After I set /GL, the compiler suggested enabling it and the suggestion went away after setting it here.
| elseif(CMAKE_CXX_COMPILER_ID MATCHES "Clang") | ||
| set(comp_name "CLANG") | ||
| # Clang compiler flags for release. Runtime performance over size. O3 is higher level, but placebo in most cases and increases code size unnecessarily. Should be profiled first. | ||
| # TODO : Investigate linker flags for CLang and GCC compilers, and test ftree-vectorize along with fvect-cost-model. flto is experimental but could help. |
There was a problem hiding this comment.
-O2 is also the default set in release builds, and lto is enabled.
-Wrong stash, oops
-Added project option to enable additional MSVC flags. Enabled by default, affects only release configuration.
|
I couldn't get this to compile with my usual workflow, and I don't have the time to go any deeper right now, so I'll leave this open for now. I'll link this pr, your repo and your building guide on the readme though. Thank you for the contribution and sorry I cannot merge stuff right now. |
I had many issues compiling for Windows. I used existing vcpkg from windows ENV to fix/work around the first error. But you need VS 2022 17.8 Preview 2 or later to avoid msys error when building master. I wrote a guide to compile on Windows.
Using non-destructive optimization flags with a bias towards runtime speed over size. In my limited profiling there's no performance degradation, and kernel spends slightly more CPU time (~1%) on apriltag functions within same time frame. Within margin of error, this means either the CPU is being utilized better, or less efficiently. Minor filesize and memory footprint increase due to more aggressive function inlining and double-alignment in memory among other things. Note that MSVC usually doesn't just apply the flags but checks if they would be beneficial first.
You can easily override them by using your own flags from terminal, they are safe fast defaults so to speak. There's more to do on the Linux side of things there but O2 is a good start. There could be potential gains from using (AVX/2/512), Maybe an AVX build could be released along with the normal SSE from here on out since VRChat requires AVX so most of the users already have it.
I've also updated the exposure script to be more verbose. It was annoying not knowing if it changed anything without checking my phone, The script tells you if IP is not found, and doesn't wait on Curl's long timeout.