-
Notifications
You must be signed in to change notification settings - Fork 13k
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type #15797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ggml-backend : add device id to device props llama : only use iGPU devices if there are no GPU devices llama : do not use multiple devices from different backends with the same device id
I checked, and unfortunately my Intel Arc integrated GPU does not support VK_EXT_pci_bus_info. Perhaps this still helps but maybe some other options need to be considered. There might not be a pci device id to blacklist. |
The purpose of the PCI device id is to prevent the same device from being used with multiple backends. This mainly affects NVIDIA and AMD GPUs when using both the CUDA/HIP and Vulkan backends at the same time. |
It could also affect Intel GPUs with SYCL and Vulkan, but we'll figure it out one step at a time. I'm sorry that I haven't gotten to review this yet, I've had a pretty busy weekend. |
If you have to format the pci ID anyways, any reason not to just use |
A string is more flexible, and in the future we can extend this to support non-PCI devices. I suspect this will be necessary to deal with integrated GPUs that are also supported by ROCm, or CUDA in the future. |
I think it's OK to detect and save the PCI-ID of iGPU and dGPU to prevent same GPU to be used by different backend. But it's not good idea to handle the dGPU and iGPU by different code path in hardcode (GGML_BACKEND_DEVICE_TYPE_IGPU). iGPUs' performance is increased more in the recently. OS(win/linux) support to detect the memory assigned to iGPU by same API of dGPU. SYCL backend supports iGPU + dGPU by default now. I suggest to handle the iGPU and dGPU by same method. They are different type GPUs only on performance and memory. AMD/Intel/Nvidia have the environment variables to filter the GPUs in multiple GPUs, and iGPU + dGPU case. |
No, but it does most of the time. Intel especially has flooded the market with low-end iGPUs that aren't really useful for anything beyond showing a desktop. Only the latest generations have both increased performance and size enough to make them useful.
Compute unit number isn't provided to Vulkan by Intel, while extensions exist for AMD and Nvidia. Model architecture isn't provided by any of the vendors. We don't have a good way to decide whether an iGPU is worth using.
Yes, and users would prefer to be able to do this from the application side, for example with the device selection flags in llama-server. Curently Vulkan doesn't show iGPUs by default if a dGPU is available, that's what we're trying to improve. If you have a better idea on how to handle this, I'm open to hear it. What do you think about marking specific devices as available, but not recommended to use? Or marking specific iGPUs as fast, so they can be enabled by default? |
There are two types of iGPU of Intel. I'm not familiar with Vulkan API & programming. But I hope the solution for Vulkan won't be common solution to impact other backend. At least, SYCL backend has supported the case of mixing iGPU & dGPU. "Curently Vulkan doesn't show iGPUs by default if a dGPU is available" I guess: in this case, Vulkan won't allow to run with iGPU. I hope it's a clue to resolve your issue on Vulkan. Thank you! |
You misunderstand me, Vulkan also supports this case, but Vulkan is compatible with all kinds of devices since around 2011. Most of the iGPUs this includes are not worth using.
No, this is device selection logic within the Vulkan backend, not any kind of API or BIOS restriction. The Vulkan API distinguishes between dedicated and integrated GPUs and this is important for memory allocations: https://registry.khronos.org/vulkan/specs/latest/man/html/VkPhysicalDeviceType.html I don't see a big issue in moving this selection logic to the GGML API. The goal is to allow applications to make an informed decision for the devices to use, and to provide a good default that should work well for llama.cpp. This default could be "use all dedicated GPUs, if no dedicated GPUs are available, use the iGPU, otherwise use CPU". It should still be overridable by the application to enable more complex combinations like heavy calculations on the dedicated GPU and the rest on the iGPU, or a 3-layer split between dGPU, iGPU and CPU, but these combinations are very finicky to manage for all the possible device combinations. Do you have a better proposal for how to handle this across all backends? |
As it is, this is likely to provide the best defaults that we can offer at the moment. In a system with a discrete GPU, using the iGPU is very likely to be suboptimal, if only due to the memory bandwidth constraints of most consumer CPUs. In the future, we may be able to offload preferentially to the discrete GPUs until they run out of memory, and then offload the rest to the integrated GPU, but that's not possible at the moment. |
Yes, I agree this idea. But I think no need to identify dGPU and iGPU in code. If the mixed dGPU case is supported now, to consider iGPU as lower dGPU will make the solution to be reused to support dGPU+iGPU case. So that the code will be simple and logic is clear. |
Ok, thank you. I still prefer this solution because it is less ambiguous, I can define clearly what is an iGPU, but defining what is a "high performance" or "low performance" GPU objectively can be difficult. This design solves the current problem as well as it can at the moment, and in the future we can consider adding some kind of ranking system that would allow more fine-grained prioritization of the devices to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I can either add the Vulkan code for iGPU detection and PCIe ID here or in a follow-up PR, whatever you prefer @slaren
In a follow-up PR is fine. |
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type
ggml-backend : add device id to device props
llama : only use iGPU devices if there are no GPU devices
llama : do not use multiple devices from different backends with the same device id