Conversation
0d44308 to
14734a4
Compare
|
|
||
| Torus *tmp_zero_lwes; | ||
| Torus *tmp_ksed_zero_lwes; | ||
| Torus *tmp_expanded_zero_lwes; |
There was a problem hiding this comment.
please initialize to nullptr
| Torus *tmp_zero_lwes; | ||
| Torus *tmp_ksed_zero_lwes; | ||
| Torus *tmp_expanded_zero_lwes; | ||
| Torus *tmp_ksed_expanded_zero_lwes; |
There was a problem hiding this comment.
pleas initialize to nullptr (this one is conditionally set in the ctr)
| lwe_trivial_indexes, ksk, input_dimension, output_dimension, ks_base_log, | ||
| ks_level, num_lwes, true, mem_ptr->ks_tmp_buf_vec); | ||
| // Keyswitch | ||
| execute_keyswitch_async<Torus>(streams.get_ith(0), lwes_to_be_added, |
There was a problem hiding this comment.
only on gpu 0? is there a multi-gpu approach to rerand?
There was a problem hiding this comment.
No, we never executed it multi-gpu. Honestly I don't remember why.
There was a problem hiding this comment.
I think what we concluded was that it's not worthy. The bottleneck was the encryption of zeroes, not KS. KS is quite fast on a single GPU. Probably multi-gpu would only slow it down.
There was a problem hiding this comment.
@guillermo-oyarzun do you think we could benefit from multi-gpuing this?
There was a problem hiding this comment.
only the ks wouldn't benefit much, cause the overhead of using many GPUs would be almost equivalent to the benefit of the parallelization, this is always 64-bit right?
There was a problem hiding this comment.
Yes, I don't recall seeing any instantiation as 32 bits.
14734a4 to
49333c1
Compare
49333c1 to
9bcfabd
Compare
closes: https://github.com/zama-ai/tfhe-rs-internal/issues/1310
PR content/description
Check-list: