-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
@threads :greedy
is implemented by creating an unbuffered channel, and having one producer task putting values into the channel, and threadpoolsize()
workers taking from the channel.
The original PR had the channel be buffered, but this was changed in code review.
I don't really understand the reasoning for making it unbuffered.
The lack of buffering adds more than 5 microseconds per element, which I think is quite a lot. I believe this is because taking from unbuffered channels is slower than taking from buffered ones as the former involves more inter-task communication. A small benchmark illustrates the difference. Below, I've manually expanded @threads.greedy
.
using Base.Threads
function crt(x, y, z)
M = BigInt(2)^3000+3
local fun
let c = Channel{Int}(Threads.threadpoolsize(); spawn=true) do ch
for i in 1:length(x)
put!(ch, i)
end
end
function fun(tid)
for it in c
local i = it
Base.GMP.MPZ.mul!(x[i], y[i], z[i])
Base.GMP.MPZ.fdiv_q!(x[i], z[i])
Base.GMP.MPZ.fdiv_r!(x[i], M)
end
end
end
Base.Threads.threading_run(fun, false)
end
M = 256*256
x = [BigInt() for _ in 1:M]
y = [rand(1:BigInt(2)^3000) for _ in 1:M]
z = [rand(1:BigInt(2)^3000) for _ in 1:M]
@time crt(x, y, z)
We can then change the buffering of the channel to 0. I get the following mean timings and allocations:
- Buffered: 125.079 ms, allocs estimate: 127
- Unbuffered: 496.851 ms, allocs estimate: 65161
cc. @Seelengrab who originally implemented the scheduler.