Releases: JuliaGPU/KernelAbstractions.jl
Releases · JuliaGPU/KernelAbstractions.jl
v0.1.5
KernelAbstractions v0.1.5
Merged pull requests:
v0.1.4
v0.1.3
KernelAbstractions v0.1.3
Merged pull requests:
v0.1.2
v0.1.1
KernelAbstractions v0.1.1
Merged pull requests:
v0.1.0
KernelAbstractions v0.1.0
Closed issues:
- Variable live-time counter intuitive on the CPU (#13)
- Using Val as kernel argument triggers an assertion (#21)
- Performance of naive transpose (#22)
- Initialization error (#23)
unroll
not defined inside a kernel (#24)- Document that private memory works differently than scratch in GPUifyLoops (#31)
- How best sync with the default stream in the CUDA backed? (#46)
Merged pull requests:
- Bring up GPU functionality fully (#1) (@vchuravy)
- Cleanup docs and remove ScalarCPU (#2) (@vchuravy)
- CompatHelper: add new compat entry for "CUDAdrv" at version "5.1" (#4) (@github-actions[bot])
- CompatHelper: add new compat entry for "Requires" at version "1.0" (#5) (@github-actions[bot])
- add stream GC and wait with progress function (#10) (@vchuravy)
- Adding a few more examples (#12) (@leios)
- Fix and test local memory (#14) (@vchuravy)
- implement Const memory for GPU and CPU (#16) (@vchuravy)
- Handle type parameters in kernel functions (#25) (@vchuravy)
- be less judicous with escape (#26) (@vchuravy)
- dont't use nested inits (#27) (@vchuravy)
- Blocked iteration (#28) (@vchuravy)
- cleanup examples (#29) (@vchuravy)
- add group index (#32) (@vchuravy)
- Make kernels dispatchable (#33) (@mwarusz)
- Use macrotools (#34) (@vchuravy)
- add a block syntax for uniform (#35) (@vchuravy)
- Fix private memory on the CPU (#36) (@mwarusz)
- handle at_synchronize in blocks (#37) (@vchuravy)
- add ntuple index type (#38) (@vchuravy)
- fix nested unroll macros (#39) (@vchuravy)
- Allow CPU and CUDA kernels to wait on each other (#41) (@lcw)
- Fix tuple destructuring and bors+travis (#42) (@vchuravy)
- Wait for GPU events using synchronize (#45) (@mwarusz)
- [WIP] Infrastructure to sync CuDefaultStream() (#47) (@vchuravy)
- Allow CPU kernels to depend on default events (#51) (@lcw)
- Implement async_copy! (#53) (@vchuravy)
- CompatHelper: bump compat for "CUDAapi" to "4.0" (#56) (@github-actions[bot])
- only create as many tasks as threads and more inference barriers (#57) (@vchuravy)
- Ensure that constify doesn't cause arguments to be captured (#58) (@vchuravy)