Skip to content

Conversation

@AssimilatedCoder
Copy link

Added Nvidia unified memory architecture based GB (Grace Blackwell) GPU support by using DCGM support. Tested on Nvidia DGX Spark (GB10)

ogcadbane and others added 10 commits December 1, 2025 16:37
- Introduce a new DCGM-backed NVIDIA GPU collector on Linux that populates
  the existing Gpu::gpu_info structures using dcgmUpdateAllFields and
  dcgmGetLatestValuesForFields.
- Prefer DCGM over NVML when built with -DBTOP_DCGM=ON and libdcgm is
  available, while keeping NVML as a transparent fallback on systems
  without DCGM.
- Track a unified nvidia_device_count so AMD (ROCm SMI) and Intel GPU
  backends stack correctly after whichever NVIDIA backend is active.
- Expose a new CMake option BTOP_DCGM and link libdcgm when enabled,
  keeping GPU runtime behaviour controlled via existing shown_gpus and
  show_gpu_info config options.
- Update README GPU compatibility and CMake documentation to describe
  the DCGM backend, including usage on DGX Spark / data center GB-series
  systems and how to enable it.

Tests:
- Built with -DBTOP_GPU=ON -DBTOP_DCGM=ON on Linux and verified that
  btop runs with DCGM present (DGX-style node) and falls back to NVML
  or no NVIDIA GPUs when DCGM is unavailable.
GPU name retrieval (lines 1290-1305): Added dcgmGetDeviceAttributes() to get proper GPU names, with the same cleanup logic as NVML
supported_functions initialization (lines 1482-1496): During collect<1>, supported_functions is now set based on which fields returned valid data, with unsupported features (pwr_state, pcie_txrx, encoder/decoder) explicitly set to false
Empty deque fallback (lines 1499-1509): All deques now guaranteed to have at least one value (0) to prevent .back() crashes
GPU name retrieval (lines 1290-1305): Added dcgmGetDeviceAttributes() to get proper GPU names, with the same cleanup logic as NVML
supported_functions initialization (lines 1482-1496): During collect<1>, supported_functions is now set based on which fields returned valid data, with unsupported features (pwr_state, pcie_txrx, encoder/decoder) explicitly set to false
Empty deque fallback (lines 1499-1509): All deques now guaranteed to have at least one value (0) to prevent .back() crashes
Copy link
Collaborator

@deckstose deckstose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lacks Makefile support.

In my opinion the changes can be split out of btop_collect into their own file.

What's up with the changes to the utility functions? Please undo them.

@aristocratos
Copy link
Owner

@deckstose
Likely AI coded.

Thinking about adding rules that PR's that are obviously vide coded should be dismissed unless the author has some proof that they actually understand the code in the PR (like for example that they have other repositories in C++ that aren't vibe coded).

@deckstose
Copy link
Collaborator

@aristocratos

I totally agree

@aristocratos
Copy link
Owner

@deckstose
Have updated CONTRIBUTING.md:

  • Submissions where the majority of the code is AI generated must be marked with [AI generated].

  • "Vibe coded" PR's where it seems like the author doesn't understand the generated code will be dismissed.

@aristocratos aristocratos added the ai generated Majority of included code is AI generated label Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai generated Majority of included code is AI generated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants