Releases · JuliaGPU/KernelAbstractions.jl · GitHub

17 Mar 19:05

v0.1.5

KernelAbstractions v0.1.5

Diff since v0.1.4

Merged pull requests:

Run Event(f) on main thread (#71) (@vchuravy)

Assets 2

13 Mar 22:05

v0.1.4

KernelAbstractions v0.1.4

Diff since v0.1.3

Merged pull requests:

Add CUDA rewrites for sincos(x) and exp(y) for complex y (#67) (@oschub)
Add Event(f, args) to integrate code using at_async better (#68) (@vchuravy)
async_copy! fixes (#69) (@lcw)

Assets 2

12 Mar 21:05

v0.1.3

KernelAbstractions v0.1.3

Diff since v0.1.2

Merged pull requests:

Allow MultiEvent to be created from an Event (#65) (@lcw)

Assets 2

11 Mar 20:05

v0.1.2

KernelAbstractions v0.1.2

Diff since v0.1.1

Merged pull requests:

Unified printing (#61) (@leios)
add multievents (#62) (@vchuravy)
don't recuse into functions like Base.sin (#63) (@vchuravy)
make at_print work outside KA (#64) (@vchuravy)

Assets 2

10 Mar 00:09

v0.1.1

KernelAbstractions v0.1.1

Diff since v0.1.0

Merged pull requests:

Fix CUDA waiting on CUDA events (#59) (@lcw)

Assets 2

09 Mar 19:05

v0.1.0

KernelAbstractions v0.1.0

Closed issues:

Variable live-time counter intuitive on the CPU (#13)
Using Val as kernel argument triggers an assertion (#21)
Performance of naive transpose (#22)
Initialization error (#23)
unroll not defined inside a kernel (#24)
Document that private memory works differently than scratch in GPUifyLoops (#31)
How best sync with the default stream in the CUDA backed? (#46)

Merged pull requests:

Bring up GPU functionality fully (#1) (@vchuravy)
Cleanup docs and remove ScalarCPU (#2) (@vchuravy)
CompatHelper: add new compat entry for "CUDAdrv" at version "5.1" (#4) (@github-actions[bot])
CompatHelper: add new compat entry for "Requires" at version "1.0" (#5) (@github-actions[bot])
add stream GC and wait with progress function (#10) (@vchuravy)
Adding a few more examples (#12) (@leios)
Fix and test local memory (#14) (@vchuravy)
implement Const memory for GPU and CPU (#16) (@vchuravy)
Handle type parameters in kernel functions (#25) (@vchuravy)
be less judicous with escape (#26) (@vchuravy)
dont't use nested inits (#27) (@vchuravy)
Blocked iteration (#28) (@vchuravy)
cleanup examples (#29) (@vchuravy)
add group index (#32) (@vchuravy)
Make kernels dispatchable (#33) (@mwarusz)
Use macrotools (#34) (@vchuravy)
add a block syntax for uniform (#35) (@vchuravy)
Fix private memory on the CPU (#36) (@mwarusz)
handle at_synchronize in blocks (#37) (@vchuravy)
add ntuple index type (#38) (@vchuravy)
fix nested unroll macros (#39) (@vchuravy)
Allow CPU and CUDA kernels to wait on each other (#41) (@lcw)
Fix tuple destructuring and bors+travis (#42) (@vchuravy)
Wait for GPU events using synchronize (#45) (@mwarusz)
[WIP] Infrastructure to sync CuDefaultStream() (#47) (@vchuravy)
Allow CPU kernels to depend on default events (#51) (@lcw)
Implement async_copy! (#53) (@vchuravy)
CompatHelper: bump compat for "CUDAapi" to "4.0" (#56) (@github-actions[bot])
only create as many tasks as threads and more inference barriers (#57) (@vchuravy)
Ensure that constify doesn't cause arguments to be captured (#58) (@vchuravy)

Assets 2