Skip to content

fix: add clear_gpu_thread_loccals#3397

Open
tmontaigu wants to merge 1 commit intomainfrom
tm/fix-gpu-exit
Open

fix: add clear_gpu_thread_loccals#3397
tmontaigu wants to merge 1 commit intomainfrom
tm/fix-gpu-exit

Conversation

@tmontaigu
Copy link
Contributor

This function is used to clear gpu thread locals.
This is mainly useful to counter the 'bug' where a rayon pool does not wait for its threads to exit, which creates sync problems between the cuda driver and the cpu thread thread_local

This function is used to clear gpu thread locals.
This is mainly useful to counter the 'bug' where a rayon pool
does not wait for its threads to exit, which creates sync problems
between the cuda driver and the cpu thread thread_local
@cla-bot cla-bot bot added the cla-signed label Mar 16, 2026
@tmontaigu tmontaigu marked this pull request as ready for review March 16, 2026 14:31
@tmontaigu tmontaigu requested a review from nsarlin-zama as a code owner March 16, 2026 14:31
let pool = ThreadPoolBuilder::new()
.num_threads(8 * num_gpus)
.exit_handler(|_| unset_server_key())
.exit_handler(|_| clear_gpu_thread_locals())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to keep this as default or we will need to remember to add it all the times we use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only want to have unset_server_key we should remove the extra CudaStreamPool used by decrypt, which I think is ok to remove

otherwise we should keep clear_gpu_thread_locals, its only needed when using a non local rayon pool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants