Skip to content

fix: gate AVX/AVX-512/AMX on OS XSAVE support via XGETBV#1435

Open
Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Mattbusel:fix/xgetbv-os-support
Open

fix: gate AVX/AVX-512/AMX on OS XSAVE support via XGETBV#1435
Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Mattbusel:fix/xgetbv-os-support

Conversation

@Mattbusel
Copy link
Copy Markdown

Problem

ggml_backend_cpu_x86_score() contains a long-standing // FIXME: this does not check for OS support comment. The function gates SIMD dispatch on CPUID feature flags only — but CPUID reports hardware capability, not whether the OS has enabled extended register state.

On systems where the OS has not set the relevant bits in XCR0 (e.g. certain hypervisors, containers with restricted XSAVE, or custom kernels), executing VEX-encoded (AVX/AVX2) or EVEX-encoded (AVX-512) instructions causes an Illegal Instruction fault (#UD / SIGILL) at runtime even though CPUID says the CPU supports them.

This has been a known issue in the GGML ecosystem (llama.cpp also carries the same comment).

Fix

Add three helpers to cpuid_x86 that read XCR0 via XGETBV after confirming OSXSAVE (ECX[27] of CPUID leaf 1):

Helper XCR0 bits checked Gates
os_saves_ymm() [2:1] — SSE + YMM AVX, AVX2, FMA, F16C, AVX-VNNI
os_saves_zmm() [7:5] — opmask + ZMM_Hi256 + Hi16_ZMM All AVX-512 variants
os_saves_amx() [18:17] — XTILECFG + XTILEDATA AMX-INT8

xgetbv() is provided as a portable static method: _xgetbv() intrinsic on MSVC, inline xgetbv asm on GCC/Clang — matching the same pattern already used for cpuid/cpuidex in this struct.

Impact

  • No behavioural change on correctly configured systems (the OS bits are set when the CPU supports the feature).
  • Prevents SIGILL on restricted environments (container runtimes, some virtualisation platforms) that do not set XCR0 to match CPUID.
  • Removes the FIXME comment that has been sitting here since the file was first written.

Testing

Verified the patch compiles with both MSVC (cl.exe /std:c++20) and GCC (g++ -std=c++20). Functional correctness can be confirmed with GGML_AVX2=1 make -C build && ./ggml-backend-cpu-x86-score on a system with XCR0 inspection enabled.

CPUID feature bits report hardware capability, but the OS must also
have enabled the relevant extended state in XCR0 before SIMD registers
can be safely used.  Without this check a process running under a
hypervisor or restricted OS that hasn't set XCR0[2:1]/[7:5]/[18:17]
will receive SIGILL (#UD) the moment it executes a VEX/EVEX instruction.

Add three helpers to cpuid_x86:
  - os_saves_ymm() — checks OSXSAVE + XCR0[2:1] (required for AVX/AVX2/FMA/F16C/AVX-VNNI)
  - os_saves_zmm() — chains os_saves_ymm() + XCR0[7:5] (all AVX-512 variants)
  - os_saves_amx() — chains os_saves_zmm() + XCR0[18:17] (AMX-INT8/BF16/FP16)

xgetbv() is provided as a portable static: _xgetbv() on MSVC,
inline asm on GCC/Clang.

Resolves the FIXME comment in ggml_backend_cpu_x86_score().
@ggerganov
Copy link
Copy Markdown
Member

Hi @Mattbusel - thank you for the contribution.

Would you mind opening the same PR in the llama.cpp repo so more people could take a look at this change. Thanks!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jan-wassenberg
Copy link
Copy Markdown

Note that OSX also has an AVX-512-related bug. You may be interested in seeing the related code for this in Highway: https://github.com/google/highway/blob/master/hwy/targets.cc#L382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants