fix: gate AVX/AVX-512/AMX on OS XSAVE support via XGETBV#1435
Open
Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Open
fix: gate AVX/AVX-512/AMX on OS XSAVE support via XGETBV#1435Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Conversation
CPUID feature bits report hardware capability, but the OS must also have enabled the relevant extended state in XCR0 before SIMD registers can be safely used. Without this check a process running under a hypervisor or restricted OS that hasn't set XCR0[2:1]/[7:5]/[18:17] will receive SIGILL (#UD) the moment it executes a VEX/EVEX instruction. Add three helpers to cpuid_x86: - os_saves_ymm() — checks OSXSAVE + XCR0[2:1] (required for AVX/AVX2/FMA/F16C/AVX-VNNI) - os_saves_zmm() — chains os_saves_ymm() + XCR0[7:5] (all AVX-512 variants) - os_saves_amx() — chains os_saves_zmm() + XCR0[18:17] (AMX-INT8/BF16/FP16) xgetbv() is provided as a portable static: _xgetbv() on MSVC, inline asm on GCC/Clang. Resolves the FIXME comment in ggml_backend_cpu_x86_score().
Member
|
Hi @Mattbusel - thank you for the contribution. Would you mind opening the same PR in the llama.cpp repo so more people could take a look at this change. Thanks! |
There was a problem hiding this comment.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Note that OSX also has an AVX-512-related bug. You may be interested in seeing the related code for this in Highway: https://github.com/google/highway/blob/master/hwy/targets.cc#L382 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ggml_backend_cpu_x86_score()contains a long-standing// FIXME: this does not check for OS supportcomment. The function gates SIMD dispatch on CPUID feature flags only — but CPUID reports hardware capability, not whether the OS has enabled extended register state.On systems where the OS has not set the relevant bits in XCR0 (e.g. certain hypervisors, containers with restricted XSAVE, or custom kernels), executing VEX-encoded (AVX/AVX2) or EVEX-encoded (AVX-512) instructions causes an Illegal Instruction fault (#UD / SIGILL) at runtime even though CPUID says the CPU supports them.
This has been a known issue in the GGML ecosystem (llama.cpp also carries the same comment).
Fix
Add three helpers to
cpuid_x86that read XCR0 viaXGETBVafter confirmingOSXSAVE(ECX[27] of CPUID leaf 1):os_saves_ymm()[2:1]— SSE + YMMos_saves_zmm()[7:5]— opmask + ZMM_Hi256 + Hi16_ZMMos_saves_amx()[18:17]— XTILECFG + XTILEDATAxgetbv()is provided as a portable static method:_xgetbv()intrinsic on MSVC, inlinexgetbvasm on GCC/Clang — matching the same pattern already used forcpuid/cpuidexin this struct.Impact
Testing
Verified the patch compiles with both MSVC (
cl.exe /std:c++20) and GCC (g++ -std=c++20). Functional correctness can be confirmed withGGML_AVX2=1 make -C build && ./ggml-backend-cpu-x86-scoreon a system with XCR0 inspection enabled.