[Do not merge] Switch to GPUArrays.jl `accumulate` implementation #625

christiangnrd · 2025-07-20T17:52:32Z

Opened to run benchmarks.

Todo:

Add compat bound when GPUArrays version released

github-actions

Metal Benchmarks

Benchmark suite	Current: `84f519a`	Previous: `de3fd23`	Ratio
`latency/precompile`	`25018084416` ns	`25055876416` ns	`1.00`
`latency/ttfp`	`2129990500` ns	`2125052000` ns	`1.00`
`latency/import`	`1225508166` ns	`1219352833` ns	`1.01`
`integration/metaldevrt`	`956625` ns	`968354.5` ns	`0.99`
`integration/byval/slices=1`	`1644625` ns	`1660375` ns	`0.99`
`integration/byval/slices=3`	`10295687.5` ns	`8945875` ns	`1.15`
`integration/byval/reference`	`1633875` ns	`1638208` ns	`1.00`
`integration/byval/slices=2`	`2747625` ns	`2721062.5` ns	`1.01`
`kernel/indexing`	`692437.5` ns	`703875` ns	`0.98`
`kernel/indexing_checked`	`681375` ns	`694208` ns	`0.98`
`kernel/launch`	`13020.5` ns	`12875` ns	`1.01`
`array/construct`	`6292` ns	`6083` ns	`1.03`
`array/broadcast`	`660542` ns	`670666.5` ns	`0.98`
`array/random/randn/Float32`	`849938` ns	`879916` ns	`0.97`
`array/random/randn!/Float32`	`621917` ns	`639812.5` ns	`0.97`
`array/random/rand!/Int64`	`554792` ns	`567000` ns	`0.98`
`array/random/rand!/Float32`	`589083` ns	`602916` ns	`0.98`
`array/random/rand/Int64`	`752104.5` ns	`754292` ns	`1.00`
`array/random/rand/Float32`	`545291` ns	`574541` ns	`0.95`
`array/accumulate/Int64/1d`	`2378188` ns	`1336875` ns	`1.78`
`array/accumulate/Int64/dims=1`	`2295312.5` ns	`1912291.5` ns	`1.20`
`array/accumulate/Int64/dims=2`	`2555417` ns	`2256916.5` ns	`1.13`
`array/accumulate/Int64/dims=1L`	`6595145.5` ns	`11644666.5` ns	`0.57`
`array/accumulate/Int64/dims=2L`	`18580062.5` ns	`9900979.5` ns	`1.88`
`array/accumulate/Float32/1d`	`1685084` ns	`1245625` ns	`1.35`
`array/accumulate/Float32/dims=1`	`2124459` ns	`1630541.5` ns	`1.30`
`array/accumulate/Float32/dims=2`	`2386125` ns	`1968750` ns	`1.21`
`array/accumulate/Float32/dims=1L`	`5082146` ns	`9898709` ns	`0.51`
`array/accumulate/Float32/dims=2L`	`14983750` ns	`7337354` ns	`2.04`
`array/reductions/reduce/Int64/1d`	`1349687.5` ns	`1381500.5` ns	`0.98`
`array/reductions/reduce/Int64/dims=1`	`1177333` ns	`1154562.5` ns	`1.02`
`array/reductions/reduce/Int64/dims=2`	`1291041` ns	`1287541` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`2127500` ns	`2078000` ns	`1.02`
`array/reductions/reduce/Int64/dims=2L`	`3575749.5` ns	`3569083` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`1015125` ns	`1047333.5` ns	`0.97`
`array/reductions/reduce/Float32/dims=1`	`885375` ns	`899875` ns	`0.98`
`array/reductions/reduce/Float32/dims=2`	`800416` ns	`801708.5` ns	`1.00`
`array/reductions/reduce/Float32/dims=1L`	`1386084` ns	`1393042` ns	`1.00`
`array/reductions/reduce/Float32/dims=2L`	`1909291` ns	`1903875` ns	`1.00`
`array/reductions/mapreduce/Int64/1d`	`1338875` ns	`1353375` ns	`0.99`
`array/reductions/mapreduce/Int64/dims=1`	`1142416` ns	`1160042` ns	`0.98`
`array/reductions/mapreduce/Int64/dims=2`	`1287270.5` ns	`1282979` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1L`	`2103667` ns	`2111146` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2L`	`3442541.5` ns	`3466062` ns	`0.99`
`array/reductions/mapreduce/Float32/1d`	`975541` ns	`1083604` ns	`0.90`
`array/reductions/mapreduce/Float32/dims=1`	`890208.5` ns	`902542` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=2`	`788042` ns	`819041.5` ns	`0.96`
`array/reductions/mapreduce/Float32/dims=1L`	`1385458.5` ns	`1404791.5` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=2L`	`1918042` ns	`1904375` ns	`1.01`
`array/private/copyto!/gpu_to_gpu`	`642271` ns	`661417` ns	`0.97`
`array/private/copyto!/cpu_to_gpu`	`819417` ns	`827708` ns	`0.99`
`array/private/copyto!/gpu_to_cpu`	`818854.5` ns	`823833` ns	`0.99`
`array/private/iteration/findall/int`	`1746687.5` ns	`1654645.5` ns	`1.06`
`array/private/iteration/findall/bool`	`1575458` ns	`1502750` ns	`1.05`
`array/private/iteration/findfirst/int`	`1933875` ns	`2023208` ns	`0.96`
`array/private/iteration/findfirst/bool`	`1745458` ns	`1852750` ns	`0.94`
`array/private/iteration/scalar`	`4033375` ns	`5040709` ns	`0.80`
`array/private/iteration/logical`	`2604125` ns	`2707041` ns	`0.96`
`array/private/iteration/findmin/1d`	`1990291` ns	`2059979` ns	`0.97`
`array/private/iteration/findmin/2d`	`1640084` ns	`1638750` ns	`1.00`
`array/private/copy`	`580229.5` ns	`566958.5` ns	`1.02`
`array/shared/copyto!/gpu_to_gpu`	`80708` ns	`79375` ns	`1.02`
`array/shared/copyto!/cpu_to_gpu`	`79708` ns	`81333` ns	`0.98`
`array/shared/copyto!/gpu_to_cpu`	`80000` ns	`78750` ns	`1.02`
`array/shared/iteration/findall/int`	`1761916.5` ns	`1657354` ns	`1.06`
`array/shared/iteration/findall/bool`	`1683500` ns	`1507000` ns	`1.12`
`array/shared/iteration/findfirst/int`	`1539875` ns	`1648125` ns	`0.93`
`array/shared/iteration/findfirst/bool`	`1427729.5` ns	`1429542` ns	`1.00`
`array/shared/iteration/scalar`	`161459` ns	`159083` ns	`1.01`
`array/shared/iteration/logical`	`2442375` ns	`2359208` ns	`1.04`
`array/shared/iteration/findmin/1d`	`1511833` ns	`1598729.5` ns	`0.95`
`array/shared/iteration/findmin/2d`	`1630792` ns	`1642520.5` ns	`0.99`
`array/shared/copy`	`250604` ns	`253958` ns	`0.99`
`array/permutedims/4d`	`2465500` ns	`2460792` ns	`1.00`
`array/permutedims/2d`	`1249208.5` ns	`1249583.5` ns	`1.00`
`array/permutedims/3d`	`1743167` ns	`1743375` ns	`1.00`
`metal/synchronization/stream`	`14875` ns	`14875` ns	`1`
`metal/synchronization/context`	`15708` ns	`15500` ns	`1.01`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions · 2025-07-20T18:24:19Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.

diff --git a/test/runtests.jl b/test/runtests.jl
index 9b6b0c3d..6d16c110 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -11,7 +11,7 @@ if parse(Bool, get(ENV, "BUILDKITE", "false"))
 end
 
 using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="accumulatetests")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "accumulatetests")
 
 # Quit without erroring if Metal loaded without issues on unsupported platforms
 if !Sys.isapple()

christiangnrd · 2025-07-20T19:02:29Z

As expected, some small regressions for most accumulate benchmarks, with a massive regression when accumulating along rows of a 3x1000000 matrix.

~~The performance improvement with column-wise accumulation with 3x1000000 matrices comes from Metal missing an easy optimization (see #626)~~ Edit: I got confused this optimization is only present for reductions.

codecov · 2025-07-20T20:19:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.35%. Comparing base (1942968) to head (b296d15).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #625      +/-   ##
==========================================
- Coverage   80.63%   80.35%   -0.29%     
==========================================
  Files          61       60       -1     
  Lines        2722     2678      -44     
==========================================
- Hits         2195     2152      -43     
+ Misses        527      526       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maleadt · 2025-07-29T08:20:11Z

As expected, some small regressions for most accumulate benchmarks, with a massive regression when accumulating along rows of a 3x1000000 matrix.

I don't see a massive slowdown?

christiangnrd · 2025-07-29T21:44:45Z

@maleadt The accumulate dims=2L benchmarks show a 2x slowdown. Did I get my rows/columns mixed up in my comment?

maleadt · 2025-07-30T06:53:53Z

Oh OK, I didn't consider 2x a "massive slowdown" :-) Still something to look at of course, but much less dramatic than the 7x regressions we e.g. saw against CUDA.jl's reduction.

christiangnrd mentioned this pull request Jul 20, 2025

[DO NOT MERGE] Test accumulate acceleratedkernel #590

Closed

github-actions bot reviewed Jul 20, 2025

View reviewed changes

christiangnrd force-pushed the noaccum branch from cc42983 to 89407fa Compare July 20, 2025 18:23

christiangnrd changed the title ~~Switch to GPUArrays.jl accumulate implementation~~ Switch to GPUArrays.jl accumulate implementation Jul 20, 2025

christiangnrd changed the title ~~Switch to GPUArrays.jl accumulate implementation~~ [Do not merge] Switch to GPUArrays.jl accumulate implementation Jul 23, 2025

christiangnrd force-pushed the noaccum branch from a1bcda9 to 3aa77f1 Compare July 29, 2025 21:45

christiangnrd force-pushed the noaccum branch from 3aa77f1 to b296d15 Compare August 1, 2025 16:23

christiangnrd added 3 commits October 20, 2025 22:26

Use GPUArrays accumulation implementation

abda01f

[REMOVE BEFORE MERGE]

12deb2f

fgsnb

84f519a

christiangnrd force-pushed the noaccum branch from b296d15 to 84f519a Compare October 21, 2025 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Do not merge] Switch to GPUArrays.jl `accumulate` implementation #625

[Do not merge] Switch to GPUArrays.jl `accumulate` implementation #625

Uh oh!

christiangnrd commented Jul 20, 2025

Uh oh!

github-actions bot left a comment •

edited

Loading

Uh oh!

github-actions bot commented Jul 20, 2025 •

edited

Loading

Uh oh!

christiangnrd commented Jul 20, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 20, 2025 •

edited

Loading

Uh oh!

maleadt commented Jul 29, 2025

Uh oh!

christiangnrd commented Jul 29, 2025

Uh oh!

maleadt commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Do not merge] Switch to GPUArrays.jl accumulate implementation #625

Are you sure you want to change the base?

[Do not merge] Switch to GPUArrays.jl accumulate implementation #625

Uh oh!

Conversation

christiangnrd commented Jul 20, 2025

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Metal Benchmarks

Uh oh!

github-actions bot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christiangnrd commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

maleadt commented Jul 29, 2025

Uh oh!

christiangnrd commented Jul 29, 2025

Uh oh!

maleadt commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Do not merge] Switch to GPUArrays.jl `accumulate` implementation #625

[Do not merge] Switch to GPUArrays.jl `accumulate` implementation #625

github-actions bot left a comment •

edited

Loading

github-actions bot commented Jul 20, 2025 •

edited

Loading

christiangnrd commented Jul 20, 2025 •

edited

Loading

codecov bot commented Jul 20, 2025 •

edited

Loading