-
Notifications
You must be signed in to change notification settings - Fork 197
Open
Description
Futharks performance in regards to block-wide scans could be improved in certain cases if there were a sequentialization factor such that a single thread works on a given number of elements. A use case that comes to mind is blocked radix sort and blocked partition. I do not believe writing code such that kernel scans have a sequentialization factor should be the users responsibility in the same way that the device-wide scan has a sequentialization factor. It may also be the case that this problem applies to other block kernels.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels