ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type #15797

slaren · 2025-09-04T15:14:14Z

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id

ggml-backend : add device id to device props llama : only use iGPU devices if there are no GPU devices llama : do not use multiple devices from different backends with the same device id

ggml-ci

Peter0x44 · 2025-09-08T12:23:10Z

I checked, and unfortunately my Intel Arc integrated GPU does not support VK_EXT_pci_bus_info. Perhaps this still helps but maybe some other options need to be considered.

There might not be a pci device id to blacklist.

slaren · 2025-09-08T12:26:28Z

The purpose of the PCI device id is to prevent the same device from being used with multiple backends. This mainly affects NVIDIA and AMD GPUs when using both the CUDA/HIP and Vulkan backends at the same time.

0cc4m · 2025-09-08T12:28:29Z

It could also affect Intel GPUs with SYCL and Vulkan, but we'll figure it out one step at a time. I'm sorry that I haven't gotten to review this yet, I've had a pretty busy weekend.

…CIBusId

0cc4m · 2025-09-08T13:28:45Z

If you have to format the pci ID anyways, any reason not to just use int[3] and compare that way instead of using a string comparison? Vulkan also provides it as 3 numbers.

slaren · 2025-09-08T13:31:35Z

A string is more flexible, and in the future we can extend this to support non-PCI devices. I suspect this will be necessary to deal with integrated GPUs that are also supported by ROCm, or CUDA in the future.

NeoZhangJianyu · 2025-09-09T01:36:18Z

I think it's OK to detect and save the PCI-ID of iGPU and dGPU to prevent same GPU to be used by different backend.

But it's not good idea to handle the dGPU and iGPU by different code path in hardcode (GGML_BACKEND_DEVICE_TYPE_IGPU).

iGPUs' performance is increased more in the recently.
The newest iGPU of Intel is beyond the old dGPU of NV/AMD in performance and memory (shared host).
To end user, iGPU does not always means low performance.

OS(win/linux) support to detect the memory assigned to iGPU by same API of dGPU.

SYCL backend supports iGPU + dGPU by default now.
Through it's not recommended. But it could one option to user to load big model.

I suggest to handle the iGPU and dGPU by same method. They are different type GPUs only on performance and memory.
The application will get the GPU list and decide which GPUs to be used by the GPU capability (compute unit number, memory, even model).

AMD/Intel/Nvidia have the environment variables to filter the GPUs in multiple GPUs, and iGPU + dGPU case.

0cc4m · 2025-09-09T06:47:24Z

iGPUs' performance is increased more in the recently. The newest iGPU of Intel is beyond the old dGPU of NV/AMD in performance and memory (shared host). To end user, iGPU does not always means low performance.

No, but it does most of the time. Intel especially has flooded the market with low-end iGPUs that aren't really useful for anything beyond showing a desktop. Only the latest generations have both increased performance and size enough to make them useful.

I suggest to handle the iGPU and dGPU by same method. They are different type GPUs only on performance and memory. The application will get the GPU list and decide which GPUs to be used by the GPU capability (compute unit number, memory, even model).

Compute unit number isn't provided to Vulkan by Intel, while extensions exist for AMD and Nvidia. Model architecture isn't provided by any of the vendors. We don't have a good way to decide whether an iGPU is worth using.

AMD/Intel/Nvidia have the environment variables to filter the GPUs in multiple GPUs, and iGPU + dGPU case.

Yes, and users would prefer to be able to do this from the application side, for example with the device selection flags in llama-server. Curently Vulkan doesn't show iGPUs by default if a dGPU is available, that's what we're trying to improve.

If you have a better idea on how to handle this, I'm open to hear it. What do you think about marking specific devices as available, but not recommended to use? Or marking specific iGPUs as fast, so they can be enabled by default?

NeoZhangJianyu · 2025-09-09T07:12:37Z

iGPUs' performance is increased more in the recently. The newest iGPU of Intel is beyond the old dGPU of NV/AMD in performance and memory (shared host). To end user, iGPU does not always means low performance.

No, but it does most of the time. Intel especially has flooded the market with low-end iGPUs that aren't really useful for anything beyond showing a desktop. Only the latest generations have both increased performance and size enough to make them useful.

I suggest to handle the iGPU and dGPU by same method. They are different type GPUs only on performance and memory. The application will get the GPU list and decide which GPUs to be used by the GPU capability (compute unit number, memory, even model).

Compute unit number isn't provided to Vulkan by Intel, while extensions exist for AMD and Nvidia. Model architecture isn't provided by any of the vendors. We don't have a good way to decide whether an iGPU is worth using.

AMD/Intel/Nvidia have the environment variables to filter the GPUs in multiple GPUs, and iGPU + dGPU case.

Yes, and users would prefer to be able to do this from the application side, for example with the device selection flags in llama-server. Curently Vulkan doesn't show iGPUs by default if a dGPU is available, that's what we're trying to improve.

If you have a better idea on how to handle this, I'm open to hear it. What do you think about marking specific devices as available, but not recommended to use? Or marking specific iGPUs as fast, so they can be enabled by default?

There are two types of iGPU of Intel.
One is the weak iGPU in Core CPU for very long time.
Other is built-in GPU with the same hardware of dGPU (Arc serious) since Meteor Lake (Core Ultra). They are powerful and use the shared host memory too.
I means powerful iGPU is the built-in GPU since Meteor Lake.

I'm not familiar with Vulkan API & programming.
I can't provide the proposal to Vulkan to handle the iGPU/dGPU case.

But I hope the solution for Vulkan won't be common solution to impact other backend. At least, SYCL backend has supported the case of mixing iGPU & dGPU.

"Curently Vulkan doesn't show iGPUs by default if a dGPU is available"
I guess Vulkan charge the render & screen show. In PC, with default BIOS setting, if the dGPU is working, the iGPU will be disable to output to screen.

I guess: in this case, Vulkan won't allow to run with iGPU.
If set the iGPU to be enable forcedly in BIOS, maybe Vulkan can detect the iGPU.

I hope it's a clue to resolve your issue on Vulkan.

Thank you!

0cc4m · 2025-09-09T08:50:54Z

There are two types of iGPU of Intel. One is the weak iGPU in Core CPU for very long time. Other is built-in GPU with the same hardware of dGPU (Arc serious) since Meteor Lake (Core Ultra). They are powerful and use the shared host memory too. I means powerful iGPU is the built-in GPU since Meteor Lake.

I'm not familiar with Vulkan API & programming. I can't provide the proposal to Vulkan to handle the iGPU/dGPU case.

But I hope the solution for Vulkan won't be common solution to impact other backend. At least, SYCL backend has supported the case of mixing iGPU & dGPU.

You misunderstand me, Vulkan also supports this case, but Vulkan is compatible with all kinds of devices since around 2011. Most of the iGPUs this includes are not worth using.

"Curently Vulkan doesn't show iGPUs by default if a dGPU is available" I guess Vulkan charge the render & screen show. In PC, with default BIOS setting, if the dGPU is working, the iGPU will be disable to output to screen.

I guess: in this case, Vulkan won't allow to run with iGPU. If set the iGPU to be enable forcedly in BIOS, maybe Vulkan can detect the iGPU.

No, this is device selection logic within the Vulkan backend, not any kind of API or BIOS restriction.

The Vulkan API distinguishes between dedicated and integrated GPUs and this is important for memory allocations: https://registry.khronos.org/vulkan/specs/latest/man/html/VkPhysicalDeviceType.html

I don't see a big issue in moving this selection logic to the GGML API. The goal is to allow applications to make an informed decision for the devices to use, and to provide a good default that should work well for llama.cpp. This default could be "use all dedicated GPUs, if no dedicated GPUs are available, use the iGPU, otherwise use CPU". It should still be overridable by the application to enable more complex combinations like heavy calculations on the dedicated GPU and the rest on the iGPU, or a 3-layer split between dGPU, iGPU and CPU, but these combinations are very finicky to manage for all the possible device combinations.

Do you have a better proposal for how to handle this across all backends?

slaren · 2025-09-09T13:35:37Z

As it is, this is likely to provide the best defaults that we can offer at the moment. In a system with a discrete GPU, using the iGPU is very likely to be suboptimal, if only due to the memory bandwidth constraints of most consumer CPUs. In the future, we may be able to offload preferentially to the discrete GPUs until they run out of memory, and then offload the rest to the integrated GPU, but that's not possible at the moment.

NeoZhangJianyu · 2025-09-10T02:29:56Z

Yes, I agree this idea.

But I think no need to identify dGPU and iGPU in code.
Both dGPU and iGPU should be considered as unified GPU device.
They have different capability.
So the device select order should be: GPU1 (high performance), GPU2(low performance), GPU3(iGPU with lower performance), CPU.
This solution can cover the mixed dGPU cases. Like 4090 + 3060 + 1060.

If the mixed dGPU case is supported now, to consider iGPU as lower dGPU will make the solution to be reused to support dGPU+iGPU case.

So that the code will be simple and logic is clear.

slaren · 2025-09-10T18:07:52Z

Ok, thank you. I still prefer this solution because it is less ambiguous, I can define clearly what is an iGPU, but defining what is a "high performance" or "low performance" GPU objectively can be difficult. This design solves the current problem as well as it can at the moment, and in the future we can consider adding some kind of ranking system that would allow more fine-grained prioritization of the devices to use.

0cc4m

Looks good to me, I can either add the Vulkan code for iGPU detection and PCIe ID here or in a follow-up PR, whatever you prefer @slaren

slaren · 2025-09-11T20:46:39Z

In a follow-up PR is fine.

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

3a1c478

ggml-backend : add device id to device props llama : only use iGPU devices if there are no GPU devices llama : do not use multiple devices from different backends with the same device id

slaren requested a review from 0cc4m as a code owner September 4, 2025 15:14

slaren mentioned this pull request Sep 4, 2025

vulkan: enumerate all non-CPU devices always, instead of only when missing a discrete GPU #15793

Closed

slaren added 3 commits September 4, 2025 17:20

fix ident

282b2d3

ggml-ci

fix llama-bench

30eb9e8

ggml-ci

extend ggml_backend_dev_props documentation

c6746f2

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend examples ggml changes relating to the ggml tensor library for machine learning labels Sep 4, 2025

cuda : format device_id from cudaDeviceProp instead of cudaDeviceGetP…

5dc82f8

…CIBusId

inforithmics mentioned this pull request Sep 11, 2025

Vulkan based on #9650 ollama/ollama#11835

Draft

28 tasks

0cc4m approved these changes Sep 11, 2025

View reviewed changes

better duplicate gpu message [no ci]

cbe8737

slaren merged commit 360d653 into master Sep 11, 2025
1 check passed

slaren deleted the sl/ggml-backend-dev-ids-ext branch September 11, 2025 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type #15797

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type #15797

slaren commented Sep 4, 2025

Uh oh!

Peter0x44 commented Sep 8, 2025 •

edited

Loading

Uh oh!

slaren commented Sep 8, 2025

Uh oh!

0cc4m commented Sep 8, 2025

Uh oh!

0cc4m commented Sep 8, 2025

Uh oh!

slaren commented Sep 8, 2025 •

edited

Loading

Uh oh!

NeoZhangJianyu commented Sep 9, 2025

Uh oh!

0cc4m commented Sep 9, 2025

Uh oh!

NeoZhangJianyu commented Sep 9, 2025

Uh oh!

0cc4m commented Sep 9, 2025

Uh oh!

slaren commented Sep 9, 2025

Uh oh!

NeoZhangJianyu commented Sep 10, 2025

Uh oh!

slaren commented Sep 10, 2025

Uh oh!

0cc4m left a comment

Uh oh!

slaren commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type #15797

ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type #15797

Conversation

slaren commented Sep 4, 2025

Uh oh!

Peter0x44 commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Sep 8, 2025

Uh oh!

0cc4m commented Sep 8, 2025

Uh oh!

0cc4m commented Sep 8, 2025

Uh oh!

slaren commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NeoZhangJianyu commented Sep 9, 2025

Uh oh!

0cc4m commented Sep 9, 2025

Uh oh!

NeoZhangJianyu commented Sep 9, 2025

Uh oh!

0cc4m commented Sep 9, 2025

Uh oh!

slaren commented Sep 9, 2025

Uh oh!

NeoZhangJianyu commented Sep 10, 2025

Uh oh!

slaren commented Sep 10, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

slaren commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

Peter0x44 commented Sep 8, 2025 •

edited

Loading

slaren commented Sep 8, 2025 •

edited

Loading