Skip to content

Conversation

@christiangnrd
Copy link
Member

1024 is the hard maximum number of threads per threadgroup, but the actual maximum depends on the maxTotalThreadsPerThreadgroup property of the kernel's MTLComputePipelineState. This makes always attempting to create a block/threadgroup with 1024 will cause errors in some situations.

This PR sets block_size to 256 like with all the other algorithms as a safe value.

As an example, the last failure in JuliaGPU/Metal.jl#590 is because of this.

@anicusan
Copy link
Member

Oof, it really can't be simple, can it... Thanks for this. It means we'll need to do some smarter querying later on

@anicusan anicusan merged commit 111c89b into JuliaGPU:main May 20, 2025
37 of 38 checks passed
@christiangnrd christiangnrd deleted the metal branch May 20, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants