Skip to content

Conversation

slaren
Copy link
Member

@slaren slaren commented Sep 4, 2025

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend examples ggml changes relating to the ggml tensor library for machine learning labels Sep 4, 2025
@Peter0x44
Copy link
Contributor

Peter0x44 commented Sep 8, 2025

I checked, and unfortunately my Intel Arc integrated GPU does not support VK_EXT_pci_bus_info. Perhaps this still helps but maybe some other options need to be considered.

There might not be a pci device id to blacklist.

@slaren
Copy link
Member Author

slaren commented Sep 8, 2025

The purpose of the PCI device id is to prevent the same device from being used with multiple backends. This mainly affects NVIDIA and AMD GPUs when using both the CUDA/HIP and Vulkan backends at the same time.

@0cc4m
Copy link
Collaborator

0cc4m commented Sep 8, 2025

It could also affect Intel GPUs with SYCL and Vulkan, but we'll figure it out one step at a time. I'm sorry that I haven't gotten to review this yet, I've had a pretty busy weekend.

@0cc4m
Copy link
Collaborator

0cc4m commented Sep 8, 2025

If you have to format the pci ID anyways, any reason not to just use int[3] and compare that way instead of using a string comparison? Vulkan also provides it as 3 numbers.

@slaren
Copy link
Member Author

slaren commented Sep 8, 2025

A string is more flexible, and in the future we can extend this to support non-PCI devices. I suspect this will be necessary to deal with integrated GPUs that are also supported by ROCm, or CUDA in the future.

@NeoZhangJianyu
Copy link
Collaborator

I think it's OK to detect and save the PCI-ID of iGPU and dGPU to prevent same GPU to be used by different backend.

But it's not good idea to handle the dGPU and iGPU by different code path in hardcode (GGML_BACKEND_DEVICE_TYPE_IGPU).

iGPUs' performance is increased more in the recently.
The newest iGPU of Intel is beyond the old dGPU of NV/AMD in performance and memory (shared host).
To end user, iGPU does not always means low performance.

OS(win/linux) support to detect the memory assigned to iGPU by same API of dGPU.

SYCL backend supports iGPU + dGPU by default now.
Through it's not recommended. But it could one option to user to load big model.

I suggest to handle the iGPU and dGPU by same method. They are different type GPUs only on performance and memory.
The application will get the GPU list and decide which GPUs to be used by the GPU capability (compute unit number, memory, even model).

AMD/Intel/Nvidia have the environment variables to filter the GPUs in multiple GPUs, and iGPU + dGPU case.

@0cc4m
Copy link
Collaborator

0cc4m commented Sep 9, 2025

iGPUs' performance is increased more in the recently. The newest iGPU of Intel is beyond the old dGPU of NV/AMD in performance and memory (shared host). To end user, iGPU does not always means low performance.

No, but it does most of the time. Intel especially has flooded the market with low-end iGPUs that aren't really useful for anything beyond showing a desktop. Only the latest generations have both increased performance and size enough to make them useful.

I suggest to handle the iGPU and dGPU by same method. They are different type GPUs only on performance and memory. The application will get the GPU list and decide which GPUs to be used by the GPU capability (compute unit number, memory, even model).

Compute unit number isn't provided to Vulkan by Intel, while extensions exist for AMD and Nvidia. Model architecture isn't provided by any of the vendors. We don't have a good way to decide whether an iGPU is worth using.

AMD/Intel/Nvidia have the environment variables to filter the GPUs in multiple GPUs, and iGPU + dGPU case.

Yes, and users would prefer to be able to do this from the application side, for example with the device selection flags in llama-server. Curently Vulkan doesn't show iGPUs by default if a dGPU is available, that's what we're trying to improve.

If you have a better idea on how to handle this, I'm open to hear it. What do you think about marking specific devices as available, but not recommended to use? Or marking specific iGPUs as fast, so they can be enabled by default?

@NeoZhangJianyu
Copy link
Collaborator

iGPUs' performance is increased more in the recently. The newest iGPU of Intel is beyond the old dGPU of NV/AMD in performance and memory (shared host). To end user, iGPU does not always means low performance.

No, but it does most of the time. Intel especially has flooded the market with low-end iGPUs that aren't really useful for anything beyond showing a desktop. Only the latest generations have both increased performance and size enough to make them useful.

I suggest to handle the iGPU and dGPU by same method. They are different type GPUs only on performance and memory. The application will get the GPU list and decide which GPUs to be used by the GPU capability (compute unit number, memory, even model).

Compute unit number isn't provided to Vulkan by Intel, while extensions exist for AMD and Nvidia. Model architecture isn't provided by any of the vendors. We don't have a good way to decide whether an iGPU is worth using.

AMD/Intel/Nvidia have the environment variables to filter the GPUs in multiple GPUs, and iGPU + dGPU case.

Yes, and users would prefer to be able to do this from the application side, for example with the device selection flags in llama-server. Curently Vulkan doesn't show iGPUs by default if a dGPU is available, that's what we're trying to improve.

If you have a better idea on how to handle this, I'm open to hear it. What do you think about marking specific devices as available, but not recommended to use? Or marking specific iGPUs as fast, so they can be enabled by default?

There are two types of iGPU of Intel.
One is the weak iGPU in Core CPU for very long time.
Other is built-in GPU with the same hardware of dGPU (Arc serious) since Meteor Lake (Core Ultra). They are powerful and use the shared host memory too.
I means powerful iGPU is the built-in GPU since Meteor Lake.

I'm not familiar with Vulkan API & programming.
I can't provide the proposal to Vulkan to handle the iGPU/dGPU case.

But I hope the solution for Vulkan won't be common solution to impact other backend. At least, SYCL backend has supported the case of mixing iGPU & dGPU.

"Curently Vulkan doesn't show iGPUs by default if a dGPU is available"
I guess Vulkan charge the render & screen show. In PC, with default BIOS setting, if the dGPU is working, the iGPU will be disable to output to screen.

I guess: in this case, Vulkan won't allow to run with iGPU.
If set the iGPU to be enable forcedly in BIOS, maybe Vulkan can detect the iGPU.

I hope it's a clue to resolve your issue on Vulkan.

Thank you!

@0cc4m
Copy link
Collaborator

0cc4m commented Sep 9, 2025

There are two types of iGPU of Intel. One is the weak iGPU in Core CPU for very long time. Other is built-in GPU with the same hardware of dGPU (Arc serious) since Meteor Lake (Core Ultra). They are powerful and use the shared host memory too. I means powerful iGPU is the built-in GPU since Meteor Lake.

I'm not familiar with Vulkan API & programming. I can't provide the proposal to Vulkan to handle the iGPU/dGPU case.

But I hope the solution for Vulkan won't be common solution to impact other backend. At least, SYCL backend has supported the case of mixing iGPU & dGPU.

You misunderstand me, Vulkan also supports this case, but Vulkan is compatible with all kinds of devices since around 2011. Most of the iGPUs this includes are not worth using.

"Curently Vulkan doesn't show iGPUs by default if a dGPU is available" I guess Vulkan charge the render & screen show. In PC, with default BIOS setting, if the dGPU is working, the iGPU will be disable to output to screen.

I guess: in this case, Vulkan won't allow to run with iGPU. If set the iGPU to be enable forcedly in BIOS, maybe Vulkan can detect the iGPU.

No, this is device selection logic within the Vulkan backend, not any kind of API or BIOS restriction.

The Vulkan API distinguishes between dedicated and integrated GPUs and this is important for memory allocations: https://registry.khronos.org/vulkan/specs/latest/man/html/VkPhysicalDeviceType.html

I don't see a big issue in moving this selection logic to the GGML API. The goal is to allow applications to make an informed decision for the devices to use, and to provide a good default that should work well for llama.cpp. This default could be "use all dedicated GPUs, if no dedicated GPUs are available, use the iGPU, otherwise use CPU". It should still be overridable by the application to enable more complex combinations like heavy calculations on the dedicated GPU and the rest on the iGPU, or a 3-layer split between dGPU, iGPU and CPU, but these combinations are very finicky to manage for all the possible device combinations.

Do you have a better proposal for how to handle this across all backends?

@slaren
Copy link
Member Author

slaren commented Sep 9, 2025

As it is, this is likely to provide the best defaults that we can offer at the moment. In a system with a discrete GPU, using the iGPU is very likely to be suboptimal, if only due to the memory bandwidth constraints of most consumer CPUs. In the future, we may be able to offload preferentially to the discrete GPUs until they run out of memory, and then offload the rest to the integrated GPU, but that's not possible at the moment.

@NeoZhangJianyu
Copy link
Collaborator

Yes, I agree this idea.

But I think no need to identify dGPU and iGPU in code.
Both dGPU and iGPU should be considered as unified GPU device.
They have different capability.
So the device select order should be: GPU1 (high performance), GPU2(low performance), GPU3(iGPU with lower performance), CPU.
This solution can cover the mixed dGPU cases. Like 4090 + 3060 + 1060.

If the mixed dGPU case is supported now, to consider iGPU as lower dGPU will make the solution to be reused to support dGPU+iGPU case.

So that the code will be simple and logic is clear.

@slaren
Copy link
Member Author

slaren commented Sep 10, 2025

Ok, thank you. I still prefer this solution because it is less ambiguous, I can define clearly what is an iGPU, but defining what is a "high performance" or "low performance" GPU objectively can be difficult. This design solves the current problem as well as it can at the moment, and in the future we can consider adding some kind of ranking system that would allow more fine-grained prioritization of the devices to use.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I can either add the Vulkan code for iGPU detection and PCIe ID here or in a follow-up PR, whatever you prefer @slaren

@slaren
Copy link
Member Author

slaren commented Sep 11, 2025

In a follow-up PR is fine.

@slaren slaren merged commit 360d653 into master Sep 11, 2025
1 check passed
@slaren slaren deleted the sl/ggml-backend-dev-ids-ext branch September 11, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants