Describe the bug
When running nvidia_gpu_exporter inside a VM with a vGPU (L40-16Q), the kernel journal gets repeated warnings like:
NVRM: serverControl_ValidateCookie: Unsupported ROUTE_TO_PHYSICAL control 0x20801347 was called on vGPU guest
The warnings stop when the exporter container is stopped, so my guess was that nvidia-smi queries by the exporter were triggering an unsupported physical-GPU control path from inside the vGPU guest. I traced this down to the exporter’s default/AUTO nvidia-smi --query-gpu=... field list. The warning is specifically triggered by querying the remapped_rows.histogram.* fields inside a vGPU guest:
remapped_rows.histogram.max
remapped_rows.histogram.high
remapped_rows.histogram.partial
remapped_rows.histogram.low
remapped_rows.histogram.none
The exporter still functions, but it causes kernel log noise as long as the exporter is scraping.
To Reproduce
Steps to reproduce the behavior:
- Use a VM with an vGPU attached (L40-16Q in my case).
- Install NVIDIA drivers (580.126.09) in the guest.
- Run nvidia_gpu_exporter via Docker Compose.
- Observe kernel logs with
journalctl -b -k -f
- After the exporter starts scraping, this warning appears and is repeated every second:
NVRM: serverControl_ValidateCookie: Unsupported ROUTE_TO_PHYSICAL control 0x20801347 was called on vGPU guest
- Stop the exporter container.
- The warnings stop.
Minimal reproducer example:
nvidia-smi --query-gpu=remapped_rows.histogram.max --format=csv,noheader,nounits
Expected behavior
I expected the exporter to avoid querying fields that trigger unsupported physical-GPU control paths in a vGPU guest.
At minimum, it would be helpful if:
- the exporter filtered these fields out automatically in vGPU guests, or
- the exporter had the ability to exclude fields manually via
--query-field-names-exclude or something similar, or
- the README/documentation mentioned that
remapped_rows.histogram.* is unsafe in vGPU guests and should be excluded via --query-field-names.
Console output
Add the error logs and/or the output to help us diagnose the problem.
Model and Version
- GPU Model [e.g.
GeForce RTX 2080 Super]: L40-16Q
- App version and architecture [e.g.
v0.1.0 - linux_x86_64]: utkuozdemir/nvidia_gpu_exporter:1.4.1
- Installation method [e.g.
homebrew, binary download]: docker compose
- Operating System [e.g.
Ubuntu Desktop 20.04, Windows 10]: Ubuntu 24.04.4
- Nvidia GPU driver version [e.g.
Linux driver nvidia-driver-440, Windows Game Ready Driver 466.63]: 580.126.09
Additional context
The final result was that all of these reproduce the warning individually:
remapped_rows.histogram.max
remapped_rows.histogram.high
remapped_rows.histogram.partial
remapped_rows.histogram.low
remapped_rows.histogram.none
and these do not:
remapped_rows.correctable
remapped_rows.uncorrectable
remapped_rows.pending
remapped_rows.failure
So it looks like the histogram breakdown queries are hitting a physical-device-only path that is not supported in a vGPU guest.
Describe the bug
When running
nvidia_gpu_exporterinside a VM with a vGPU (L40-16Q), the kernel journal gets repeated warnings like:NVRM: serverControl_ValidateCookie: Unsupported ROUTE_TO_PHYSICAL control 0x20801347 was called on vGPU guestThe warnings stop when the exporter container is stopped, so my guess was that
nvidia-smiqueries by the exporter were triggering an unsupported physical-GPU control path from inside the vGPU guest. I traced this down to the exporter’s default/AUTOnvidia-smi --query-gpu=...field list. The warning is specifically triggered by querying theremapped_rows.histogram.*fields inside a vGPU guest:The exporter still functions, but it causes kernel log noise as long as the exporter is scraping.
To Reproduce
Steps to reproduce the behavior:
journalctl -b -k -fNVRM: serverControl_ValidateCookie: Unsupported ROUTE_TO_PHYSICAL control 0x20801347 was called on vGPU guestMinimal reproducer example:
Expected behavior
I expected the exporter to avoid querying fields that trigger unsupported physical-GPU control paths in a vGPU guest.
At minimum, it would be helpful if:
--query-field-names-excludeor something similar, orremapped_rows.histogram.*is unsafe in vGPU guests and should be excluded via--query-field-names.Console output
Add the error logs and/or the output to help us diagnose the problem.
Model and Version
GeForce RTX 2080 Super]:L40-16Qv0.1.0 - linux_x86_64]:utkuozdemir/nvidia_gpu_exporter:1.4.1homebrew, binary download]:docker composeUbuntu Desktop 20.04,Windows 10]:Ubuntu 24.04.4Linux driver nvidia-driver-440,Windows Game Ready Driver 466.63]:580.126.09Additional context
The final result was that all of these reproduce the warning individually:
and these do not:
So it looks like the histogram breakdown queries are hitting a physical-device-only path that is not supported in a vGPU guest.