Skip to content

Releases: JuliaGPU/KernelAbstractions.jl

v0.1.5

17 Mar 19:05
74bf47e
Compare
Choose a tag to compare

KernelAbstractions v0.1.5

Diff since v0.1.4

Merged pull requests:

v0.1.4

13 Mar 22:05
8a95e72
Compare
Choose a tag to compare

KernelAbstractions v0.1.4

Diff since v0.1.3

Merged pull requests:

  • Add CUDA rewrites for sincos(x) and exp(y) for complex y (#67) (@oschub)
  • Add Event(f, args) to integrate code using at_async better (#68) (@vchuravy)
  • async_copy! fixes (#69) (@lcw)

v0.1.3

12 Mar 21:05
11f3970
Compare
Choose a tag to compare

KernelAbstractions v0.1.3

Diff since v0.1.2

Merged pull requests:

  • Allow MultiEvent to be created from an Event (#65) (@lcw)

v0.1.2

11 Mar 20:05
8cbc86e
Compare
Choose a tag to compare

KernelAbstractions v0.1.2

Diff since v0.1.1

Merged pull requests:

v0.1.1

10 Mar 00:09
c3b0428
Compare
Choose a tag to compare

KernelAbstractions v0.1.1

Diff since v0.1.0

Merged pull requests:

  • Fix CUDA waiting on CUDA events (#59) (@lcw)

v0.1.0

09 Mar 19:05
91103f1
Compare
Choose a tag to compare

KernelAbstractions v0.1.0

Closed issues:

  • Variable live-time counter intuitive on the CPU (#13)
  • Using Val as kernel argument triggers an assertion (#21)
  • Performance of naive transpose (#22)
  • Initialization error (#23)
  • unroll not defined inside a kernel (#24)
  • Document that private memory works differently than scratch in GPUifyLoops (#31)
  • How best sync with the default stream in the CUDA backed? (#46)

Merged pull requests:

  • Bring up GPU functionality fully (#1) (@vchuravy)
  • Cleanup docs and remove ScalarCPU (#2) (@vchuravy)
  • CompatHelper: add new compat entry for "CUDAdrv" at version "5.1" (#4) (@github-actions[bot])
  • CompatHelper: add new compat entry for "Requires" at version "1.0" (#5) (@github-actions[bot])
  • add stream GC and wait with progress function (#10) (@vchuravy)
  • Adding a few more examples (#12) (@leios)
  • Fix and test local memory (#14) (@vchuravy)
  • implement Const memory for GPU and CPU (#16) (@vchuravy)
  • Handle type parameters in kernel functions (#25) (@vchuravy)
  • be less judicous with escape (#26) (@vchuravy)
  • dont't use nested inits (#27) (@vchuravy)
  • Blocked iteration (#28) (@vchuravy)
  • cleanup examples (#29) (@vchuravy)
  • add group index (#32) (@vchuravy)
  • Make kernels dispatchable (#33) (@mwarusz)
  • Use macrotools (#34) (@vchuravy)
  • add a block syntax for uniform (#35) (@vchuravy)
  • Fix private memory on the CPU (#36) (@mwarusz)
  • handle at_synchronize in blocks (#37) (@vchuravy)
  • add ntuple index type (#38) (@vchuravy)
  • fix nested unroll macros (#39) (@vchuravy)
  • Allow CPU and CUDA kernels to wait on each other (#41) (@lcw)
  • Fix tuple destructuring and bors+travis (#42) (@vchuravy)
  • Wait for GPU events using synchronize (#45) (@mwarusz)
  • [WIP] Infrastructure to sync CuDefaultStream() (#47) (@vchuravy)
  • Allow CPU kernels to depend on default events (#51) (@lcw)
  • Implement async_copy! (#53) (@vchuravy)
  • CompatHelper: bump compat for "CUDAapi" to "4.0" (#56) (@github-actions[bot])
  • only create as many tasks as threads and more inference barriers (#57) (@vchuravy)
  • Ensure that constify doesn't cause arguments to be captured (#58) (@vchuravy)