Skip to content

perf: add initial cache-blocked dit fft impl#48

Merged
Shnatsel merged 1 commit intomainfrom
feature/cache-blocked-dit-fft
Nov 23, 2025
Merged

perf: add initial cache-blocked dit fft impl#48
Shnatsel merged 1 commit intomainfrom
feature/cache-blocked-dit-fft

Conversation

@smu160
Copy link
Member

@smu160 smu160 commented Nov 16, 2025

No description provided.

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.29%. Comparing base (2e67b5c) to head (790861d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #48      +/-   ##
==========================================
+ Coverage   99.16%   99.29%   +0.12%     
==========================================
  Files          12       12              
  Lines        2167     2257      +90     
==========================================
+ Hits         2149     2241      +92     
+ Misses         18       16       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Shnatsel
Copy link
Collaborator

Nice, I'm seeing up to 50%-ish improvements for larger sizes!

Timings on my Zen 4 CPU compared to main
 ~/C/PhastFT (feature/cache-blocked-dit-fft)> RUSTFLAGS='-C target-cpu=native' cargo bench --bench=bench PhastFT -- --baseline=main-native
   Compiling phastft v0.3.0 (/home/shnatsel/Code/PhastFT)
    Finished `bench` profile [optimized] target(s) in 21.79s
     Running benches/bench.rs (target/release/deps/bench-441a0f9fdf4ea83f)
Forward f32/PhastFT DIF/64
                        time:   [339.42 ns 339.55 ns 339.68 ns]
                        thrpt:  [188.41 Melem/s 188.48 Melem/s 188.56 Melem/s]
                 change:
                        time:   [+0.6950% +0.8077% +0.9117%] (p = 0.00 < 0.05)
                        thrpt:  [−0.9035% −0.8013% −0.6902%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe
Forward f32/PhastFT DIT/64
                        time:   [144.77 ns 144.95 ns 145.15 ns]
                        thrpt:  [440.92 Melem/s 441.54 Melem/s 442.08 Melem/s]
                 change:
                        time:   [−2.7247% −2.4981% −2.2775%] (p = 0.00 < 0.05)
                        thrpt:  [+2.3306% +2.5621% +2.8010%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Forward f32/PhastFT DIF/128
                        time:   [562.97 ns 563.41 ns 564.09 ns]
                        thrpt:  [226.91 Melem/s 227.19 Melem/s 227.36 Melem/s]
                 change:
                        time:   [+0.4576% +0.5771% +0.6922%] (p = 0.00 < 0.05)
                        thrpt:  [−0.6874% −0.5738% −0.4555%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe
Forward f32/PhastFT DIT/128
                        time:   [203.33 ns 203.45 ns 203.58 ns]
                        thrpt:  [628.75 Melem/s 629.13 Melem/s 629.50 Melem/s]
                 change:
                        time:   [−1.0446% −0.9030% −0.7709%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7769% +0.9112% +1.0556%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
Forward f32/PhastFT DIF/256
                        time:   [1.0058 µs 1.0063 µs 1.0067 µs]
                        thrpt:  [254.29 Melem/s 254.41 Melem/s 254.52 Melem/s]
                 change:
                        time:   [+0.8871% +0.9852% +1.0883%] (p = 0.00 < 0.05)
                        thrpt:  [−1.0766% −0.9756% −0.8793%]
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe
Forward f32/PhastFT DIT/256
                        time:   [352.96 ns 353.27 ns 353.69 ns]
                        thrpt:  [723.81 Melem/s 724.65 Melem/s 725.30 Melem/s]
                 change:
                        time:   [−1.3792% −1.2210% −1.0586%] (p = 0.00 < 0.05)
                        thrpt:  [+1.0700% +1.2361% +1.3985%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high severe
Forward f32/PhastFT DIF/512
                        time:   [2.4369 µs 2.4377 µs 2.4386 µs]
                        thrpt:  [209.95 Melem/s 210.03 Melem/s 210.11 Melem/s]
                 change:
                        time:   [+2.2557% +2.3866% +2.4883%] (p = 0.00 < 0.05)
                        thrpt:  [−2.4279% −2.3309% −2.2059%]
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) low severe
  7 (7.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe
Forward f32/PhastFT DIT/512
                        time:   [1.2311 µs 1.2319 µs 1.2327 µs]
                        thrpt:  [415.36 Melem/s 415.61 Melem/s 415.90 Melem/s]
                 change:
                        time:   [+3.4294% +3.5799% +3.7491%] (p = 0.00 < 0.05)
                        thrpt:  [−3.6136% −3.4562% −3.3157%]
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
Forward f32/PhastFT DIF/1024
                        time:   [5.6058 µs 5.6081 µs 5.6102 µs]
                        thrpt:  [182.52 Melem/s 182.59 Melem/s 182.67 Melem/s]
                 change:
                        time:   [−0.7163% −0.5525% −0.4069%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4086% +0.5556% +0.7214%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low severe
  1 (1.00%) high mild
Forward f32/PhastFT DIT/1024
                        time:   [3.3746 µs 3.3774 µs 3.3799 µs]
                        thrpt:  [302.97 Melem/s 303.19 Melem/s 303.44 Melem/s]
                 change:
                        time:   [−0.4231% −0.2472% −0.0439%] (p = 0.01 < 0.05)
                        thrpt:  [+0.0439% +0.2478% +0.4249%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
Forward f32/PhastFT DIF/2048
                        time:   [7.4943 µs 7.4982 µs 7.5019 µs]
                        thrpt:  [273.00 Melem/s 273.13 Melem/s 273.27 Melem/s]
                 change:
                        time:   [+0.0561% +0.1807% +0.3186%] (p = 0.01 < 0.05)
                        thrpt:  [−0.3176% −0.1803% −0.0561%]
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high severe
Forward f32/PhastFT DIT/2048
                        time:   [3.0368 µs 3.0388 µs 3.0408 µs]
                        thrpt:  [673.51 Melem/s 673.94 Melem/s 674.39 Melem/s]
                 change:
                        time:   [+0.5840% +0.7395% +0.8884%] (p = 0.00 < 0.05)
                        thrpt:  [−0.8805% −0.7341% −0.5807%]
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
Forward f32/PhastFT DIF/4096
                        time:   [15.293 µs 15.296 µs 15.299 µs]
                        thrpt:  [267.73 Melem/s 267.78 Melem/s 267.83 Melem/s]
                 change:
                        time:   [−0.1420% −0.0447% +0.0613%] (p = 0.42 > 0.05)
                        thrpt:  [−0.0612% +0.0447% +0.1422%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
Forward f32/PhastFT DIT/4096
                        time:   [6.4379 µs 6.4424 µs 6.4472 µs]
                        thrpt:  [635.31 Melem/s 635.78 Melem/s 636.23 Melem/s]
                 change:
                        time:   [−2.5933% −2.4211% −2.2554%] (p = 0.00 < 0.05)
                        thrpt:  [+2.3074% +2.4812% +2.6623%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Forward f32/PhastFT DIF/8192
                        time:   [31.799 µs 31.832 µs 31.874 µs]
                        thrpt:  [257.01 Melem/s 257.35 Melem/s 257.61 Melem/s]
                 change:
                        time:   [+2.7312% +3.2295% +3.7907%] (p = 0.00 < 0.05)
                        thrpt:  [−3.6523% −3.1284% −2.6586%]
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
Forward f32/PhastFT DIT/8192
                        time:   [14.130 µs 14.165 µs 14.211 µs]
                        thrpt:  [576.45 Melem/s 578.32 Melem/s 579.77 Melem/s]
                 change:
                        time:   [+3.7590% +5.3473% +6.9516%] (p = 0.00 < 0.05)
                        thrpt:  [−6.4997% −5.0758% −3.6228%]
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
Forward f32/PhastFT DIF/16384
                        time:   [81.974 µs 81.999 µs 82.024 µs]
                        thrpt:  [199.75 Melem/s 199.81 Melem/s 199.87 Melem/s]
                 change:
                        time:   [+0.1658% +0.2961% +0.4320%] (p = 0.00 < 0.05)
                        thrpt:  [−0.4301% −0.2952% −0.1655%]
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  7 (7.00%) high severe
Forward f32/PhastFT DIT/16384
                        time:   [47.126 µs 47.163 µs 47.193 µs]
                        thrpt:  [347.17 Melem/s 347.39 Melem/s 347.66 Melem/s]
                 change:
                        time:   [−0.2829% −0.0058% +0.2918%] (p = 0.97 > 0.05)
                        thrpt:  [−0.2910% +0.0058% +0.2837%]
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
Forward f32/PhastFT DIF/32768
                        time:   [196.72 µs 196.77 µs 196.83 µs]
                        thrpt:  [166.48 Melem/s 166.53 Melem/s 166.57 Melem/s]
                 change:
                        time:   [−0.2592% −0.1354% −0.0227%] (p = 0.02 < 0.05)
                        thrpt:  [+0.0227% +0.1356% +0.2598%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low severe
  2 (2.00%) high mild
  2 (2.00%) high severe
Forward f32/PhastFT DIT/32768
                        time:   [127.16 µs 127.22 µs 127.29 µs]
                        thrpt:  [257.43 Melem/s 257.57 Melem/s 257.69 Melem/s]
                 change:
                        time:   [+0.0022% +0.0950% +0.1956%] (p = 0.06 > 0.05)
                        thrpt:  [−0.1952% −0.0949% −0.0022%]
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe
Forward f32/PhastFT DIF/65536
                        time:   [391.92 µs 392.00 µs 392.10 µs]
                        thrpt:  [167.14 Melem/s 167.18 Melem/s 167.22 Melem/s]
                 change:
                        time:   [+0.4627% +0.5909% +0.7463%] (p = 0.00 < 0.05)
                        thrpt:  [−0.7407% −0.5875% −0.4606%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
Forward f32/PhastFT DIT/65536
                        time:   [254.72 µs 254.79 µs 254.87 µs]
                        thrpt:  [257.13 Melem/s 257.21 Melem/s 257.29 Melem/s]
                 change:
                        time:   [+0.1791% +0.2679% +0.3588%] (p = 0.00 < 0.05)
                        thrpt:  [−0.3576% −0.2672% −0.1787%]
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  6 (6.00%) high mild
  1 (1.00%) high severe
Benchmarking Forward f32/PhastFT DIF/131072: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50.
Forward f32/PhastFT DIF/131072
                        time:   [807.41 µs 807.72 µs 808.07 µs]
                        thrpt:  [162.20 Melem/s 162.27 Melem/s 162.34 Melem/s]
                 change:
                        time:   [−1.5557% −1.4330% −1.3095%] (p = 0.00 < 0.05)
                        thrpt:  [+1.3269% +1.4538% +1.5803%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking Forward f32/PhastFT DIT/131072: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
Forward f32/PhastFT DIT/131072
                        time:   [518.19 µs 518.56 µs 518.97 µs]
                        thrpt:  [252.56 Melem/s 252.76 Melem/s 252.94 Melem/s]
                 change:
                        time:   [−5.0240% −4.8492% −4.6837%] (p = 0.00 < 0.05)
                        thrpt:  [+4.9139% +5.0964% +5.2898%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Forward f32/PhastFT DIF/262144
                        time:   [1.6862 ms 1.6867 ms 1.6873 ms]
                        thrpt:  [155.37 Melem/s 155.42 Melem/s 155.46 Melem/s]
                 change:
                        time:   [−7.6713% −7.6351% −7.5954%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2197% +8.2662% +8.3086%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
Forward f32/PhastFT DIT/262144
                        time:   [1.1354 ms 1.1357 ms 1.1360 ms]
                        thrpt:  [230.75 Melem/s 230.82 Melem/s 230.88 Melem/s]
                 change:
                        time:   [−10.081% −10.050% −10.017%] (p = 0.00 < 0.05)
                        thrpt:  [+11.133% +11.173% +11.212%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Forward f32/PhastFT DIF/524288
                        time:   [3.4092 ms 3.4100 ms 3.4109 ms]
                        thrpt:  [153.71 Melem/s 153.75 Melem/s 153.79 Melem/s]
                 change:
                        time:   [−4.5582% −4.5196% −4.4790%] (p = 0.00 < 0.05)
                        thrpt:  [+4.6890% +4.7335% +4.7759%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
Forward f32/PhastFT DIT/524288
                        time:   [2.2863 ms 2.2872 ms 2.2882 ms]
                        thrpt:  [229.13 Melem/s 229.23 Melem/s 229.32 Melem/s]
                 change:
                        time:   [−8.2517% −8.2025% −8.1531%] (p = 0.00 < 0.05)
                        thrpt:  [+8.8768% +8.9354% +8.9939%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
Forward f32/PhastFT DIF/1048576
                        time:   [6.7822 ms 6.7857 ms 6.7895 ms]
                        thrpt:  [154.44 Melem/s 154.53 Melem/s 154.61 Melem/s]
                 change:
                        time:   [−1.7416% −1.6847% −1.6161%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6426% +1.7136% +1.7724%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
Forward f32/PhastFT DIT/1048576
                        time:   [4.9613 ms 4.9656 ms 4.9702 ms]
                        thrpt:  [210.97 Melem/s 211.17 Melem/s 211.35 Melem/s]
                 change:
                        time:   [−2.7503% −2.6339% −2.5081%] (p = 0.00 < 0.05)
                        thrpt:  [+2.5727% +2.7051% +2.8281%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
Forward f32/PhastFT DIF/2097152
                        time:   [15.318 ms 15.347 ms 15.376 ms]
                        thrpt:  [136.39 Melem/s 136.65 Melem/s 136.91 Melem/s]
                 change:
                        time:   [−5.6395% −5.4311% −5.2316%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5204% +5.7430% +5.9766%]
                        Performance has improved.
Forward f32/PhastFT DIT/2097152
                        time:   [10.017 ms 10.027 ms 10.038 ms]
                        thrpt:  [208.91 Melem/s 209.16 Melem/s 209.35 Melem/s]
                 change:
                        time:   [−22.855% −22.669% −22.478%] (p = 0.00 < 0.05)
                        thrpt:  [+28.996% +29.315% +29.627%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking Forward f32/PhastFT DIF/4194304: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.0s, or reduce sample count to 70.
Forward f32/PhastFT DIF/4194304
                        time:   [42.744 ms 43.108 ms 43.463 ms]
                        thrpt:  [96.503 Melem/s 97.297 Melem/s 98.126 Melem/s]
                 change:
                        time:   [+2.1913% +3.3255% +4.4422%] (p = 0.00 < 0.05)
                        thrpt:  [−4.2533% −3.2184% −2.1443%]
                        Performance has regressed.
Forward f32/PhastFT DIT/4194304
                        time:   [22.172 ms 22.298 ms 22.425 ms]
                        thrpt:  [187.04 Melem/s 188.11 Melem/s 189.17 Melem/s]
                 change:
                        time:   [−15.948% −14.892% −13.944%] (p = 0.00 < 0.05)
                        thrpt:  [+16.203% +17.498% +18.974%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
Benchmarking Forward f32/PhastFT DIF/8388608: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.7s, or reduce sample count to 20.
Forward f32/PhastFT DIF/8388608
                        time:   [88.698 ms 89.075 ms 89.480 ms]
                        thrpt:  [93.749 Melem/s 94.175 Melem/s 94.575 Melem/s]
                 change:
                        time:   [+1.1979% +1.8959% +2.6525%] (p = 0.00 < 0.05)
                        thrpt:  [−2.5840% −1.8606% −1.1837%]
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
Benchmarking Forward f32/PhastFT DIT/8388608: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 11.5s, or reduce sample count to 40.
Forward f32/PhastFT DIT/8388608
                        time:   [46.965 ms 47.302 ms 47.670 ms]
                        thrpt:  [175.97 Melem/s 177.34 Melem/s 178.62 Melem/s]
                 change:
                        time:   [−31.683% −31.104% −30.480%] (p = 0.00 < 0.05)
                        thrpt:  [+43.843% +45.146% +46.377%]
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  9 (9.00%) high mild
  9 (9.00%) high severe
Benchmarking Forward f32/PhastFT DIF/16777216: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 33.1s, or reduce sample count to 10.
Forward f32/PhastFT DIF/16777216
                        time:   [211.95 ms 212.44 ms 212.98 ms]
                        thrpt:  [78.773 Melem/s 78.975 Melem/s 79.155 Melem/s]
                 change:
                        time:   [−40.386% −40.207% −40.007%] (p = 0.00 < 0.05)
                        thrpt:  [+66.687% +67.243% +67.745%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
Benchmarking Forward f32/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 22.6s, or reduce sample count to 20.
Forward f32/PhastFT DIT/16777216
                        time:   [106.91 ms 107.47 ms 108.07 ms]
                        thrpt:  [155.25 Melem/s 156.11 Melem/s 156.92 Melem/s]
                 change:
                        time:   [−36.347% −35.909% −35.484%] (p = 0.00 < 0.05)
                        thrpt:  [+54.999% +56.029% +57.103%]
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe

Inverse f32/PhastFT DIF/64
                        time:   [351.88 ns 352.64 ns 353.41 ns]
                        thrpt:  [181.09 Melem/s 181.49 Melem/s 181.88 Melem/s]
                 change:
                        time:   [+0.4868% +0.7314% +0.9592%] (p = 0.00 < 0.05)
                        thrpt:  [−0.9501% −0.7261% −0.4845%]
                        Change within noise threshold.
Inverse f32/PhastFT DIT/64
                        time:   [158.70 ns 159.13 ns 159.48 ns]
                        thrpt:  [401.31 Melem/s 402.19 Melem/s 403.28 Melem/s]
                 change:
                        time:   [−3.1658% −2.8727% −2.6007%] (p = 0.00 < 0.05)
                        thrpt:  [+2.6702% +2.9577% +3.2692%]
                        Performance has improved.
Inverse f32/PhastFT DIF/128
                        time:   [580.18 ns 581.04 ns 582.00 ns]
                        thrpt:  [219.93 Melem/s 220.29 Melem/s 220.62 Melem/s]
                 change:
                        time:   [+1.0582% +1.2276% +1.3977%] (p = 0.00 < 0.05)
                        thrpt:  [−1.3784% −1.2127% −1.0471%]
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
Inverse f32/PhastFT DIT/128
                        time:   [217.72 ns 217.98 ns 218.23 ns]
                        thrpt:  [586.54 Melem/s 587.22 Melem/s 587.90 Melem/s]
                 change:
                        time:   [−1.9278% −1.7934% −1.6562%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6841% +1.8261% +1.9657%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
Inverse f32/PhastFT DIF/256
                        time:   [1.0268 µs 1.0276 µs 1.0283 µs]
                        thrpt:  [248.95 Melem/s 249.13 Melem/s 249.31 Melem/s]
                 change:
                        time:   [+0.7134% +0.8500% +0.9855%] (p = 0.00 < 0.05)
                        thrpt:  [−0.9759% −0.8428% −0.7084%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
Inverse f32/PhastFT DIT/256
                        time:   [372.10 ns 372.33 ns 372.55 ns]
                        thrpt:  [687.15 Melem/s 687.57 Melem/s 688.00 Melem/s]
                 change:
                        time:   [−2.2687% −2.0901% −1.9016%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9385% +2.1347% +2.3213%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
Inverse f32/PhastFT DIF/512
                        time:   [2.4736 µs 2.4743 µs 2.4751 µs]
                        thrpt:  [206.86 Melem/s 206.93 Melem/s 206.99 Melem/s]
                 change:
                        time:   [+1.7778% +1.9295% +2.0696%] (p = 0.00 < 0.05)
                        thrpt:  [−2.0277% −1.8930% −1.7468%]
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
Inverse f32/PhastFT DIT/512
                        time:   [1.2661 µs 1.2671 µs 1.2681 µs]
                        thrpt:  [403.76 Melem/s 404.06 Melem/s 404.39 Melem/s]
                 change:
                        time:   [+2.6088% +2.8298% +3.0546%] (p = 0.00 < 0.05)
                        thrpt:  [−2.9640% −2.7519% −2.5424%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high severe
Inverse f32/PhastFT DIF/1024
                        time:   [5.7089 µs 5.7115 µs 5.7141 µs]
                        thrpt:  [179.21 Melem/s 179.29 Melem/s 179.37 Melem/s]
                 change:
                        time:   [−0.1948% −0.0526% +0.0927%] (p = 0.49 > 0.05)
                        thrpt:  [−0.0926% +0.0526% +0.1951%]
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
Inverse f32/PhastFT DIT/1024
                        time:   [3.4640 µs 3.4671 µs 3.4698 µs]
                        thrpt:  [295.12 Melem/s 295.35 Melem/s 295.61 Melem/s]
                 change:
                        time:   [−0.5249% −0.3311% −0.1204%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1205% +0.3322% +0.5277%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low severe
Inverse f32/PhastFT DIF/2048
                        time:   [7.6210 µs 7.6253 µs 7.6300 µs]
                        thrpt:  [268.41 Melem/s 268.58 Melem/s 268.73 Melem/s]
                 change:
                        time:   [−0.5306% −0.4029% −0.2759%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2767% +0.4046% +0.5334%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low severe
  1 (1.00%) high severe
Inverse f32/PhastFT DIT/2048
                        time:   [3.2335 µs 3.2361 µs 3.2389 µs]
                        thrpt:  [632.32 Melem/s 632.86 Melem/s 633.37 Melem/s]
                 change:
                        time:   [+0.1676% +0.3779% +0.5730%] (p = 0.00 < 0.05)
                        thrpt:  [−0.5698% −0.3765% −0.1673%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high mild
Inverse f32/PhastFT DIF/4096
                        time:   [15.796 µs 15.802 µs 15.810 µs]
                        thrpt:  [259.07 Melem/s 259.20 Melem/s 259.31 Melem/s]
                 change:
                        time:   [−0.2260% −0.0730% +0.0660%] (p = 0.34 > 0.05)
                        thrpt:  [−0.0660% +0.0731% +0.2265%]
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high severe
Inverse f32/PhastFT DIT/4096
                        time:   [6.8315 µs 6.8354 µs 6.8387 µs]
                        thrpt:  [598.95 Melem/s 599.23 Melem/s 599.57 Melem/s]
                 change:
                        time:   [−1.8897% −1.6179% −1.3245%] (p = 0.00 < 0.05)
                        thrpt:  [+1.3423% +1.6445% +1.9261%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
Inverse f32/PhastFT DIF/8192
                        time:   [33.748 µs 33.848 µs 33.978 µs]
                        thrpt:  [241.09 Melem/s 242.02 Melem/s 242.74 Melem/s]
                 change:
                        time:   [+6.4124% +7.8286% +9.2127%] (p = 0.00 < 0.05)
                        thrpt:  [−8.4355% −7.2603% −6.0260%]
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe
Inverse f32/PhastFT DIT/8192
                        time:   [14.577 µs 14.589 µs 14.604 µs]
                        thrpt:  [560.93 Melem/s 561.51 Melem/s 561.98 Melem/s]
                 change:
                        time:   [−0.7712% −0.3298% +0.1769%] (p = 0.17 > 0.05)
                        thrpt:  [−0.1766% +0.3308% +0.7772%]
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe
Inverse f32/PhastFT DIF/16384
                        time:   [83.906 µs 83.931 µs 83.960 µs]
                        thrpt:  [195.14 Melem/s 195.21 Melem/s 195.27 Melem/s]
                 change:
                        time:   [+0.0743% +0.1957% +0.3367%] (p = 0.00 < 0.05)
                        thrpt:  [−0.3356% −0.1953% −0.0742%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe
Inverse f32/PhastFT DIT/16384
                        time:   [49.206 µs 49.234 µs 49.265 µs]
                        thrpt:  [332.57 Melem/s 332.78 Melem/s 332.97 Melem/s]
                 change:
                        time:   [−0.4674% −0.2220% +0.0336%] (p = 0.08 > 0.05)
                        thrpt:  [−0.0336% +0.2225% +0.4696%]
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  9 (9.00%) high severe
Inverse f32/PhastFT DIF/32768
                        time:   [200.35 µs 200.39 µs 200.44 µs]
                        thrpt:  [163.48 Melem/s 163.52 Melem/s 163.56 Melem/s]
                 change:
                        time:   [+0.7339% +0.8569% +1.0002%] (p = 0.00 < 0.05)
                        thrpt:  [−0.9903% −0.8496% −0.7286%]
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  3 (3.00%) high severe
Inverse f32/PhastFT DIT/32768
                        time:   [131.19 µs 131.23 µs 131.26 µs]
                        thrpt:  [249.64 Melem/s 249.70 Melem/s 249.78 Melem/s]
                 change:
                        time:   [−1.0630% −0.7513% −0.4821%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4845% +0.7570% +1.0744%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
Inverse f32/PhastFT DIF/65536
                        time:   [399.11 µs 399.20 µs 399.28 µs]
                        thrpt:  [164.14 Melem/s 164.17 Melem/s 164.21 Melem/s]
                 change:
                        time:   [−0.1134% −0.0391% +0.0303%] (p = 0.29 > 0.05)
                        thrpt:  [−0.0303% +0.0391% +0.1135%]
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild
Inverse f32/PhastFT DIT/65536
                        time:   [263.04 µs 263.16 µs 263.26 µs]
                        thrpt:  [248.94 Melem/s 249.04 Melem/s 249.15 Melem/s]
                 change:
                        time:   [+0.9002% +0.9913% +1.0850%] (p = 0.00 < 0.05)
                        thrpt:  [−1.0734% −0.9816% −0.8922%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
Benchmarking Inverse f32/PhastFT DIF/131072: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50.
Inverse f32/PhastFT DIF/131072
                        time:   [822.68 µs 822.92 µs 823.18 µs]
                        thrpt:  [159.23 Melem/s 159.28 Melem/s 159.32 Melem/s]
                 change:
                        time:   [−0.8796% −0.7909% −0.7013%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7063% +0.7972% +0.8874%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking Inverse f32/PhastFT DIT/131072: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
Inverse f32/PhastFT DIT/131072
                        time:   [537.85 µs 538.17 µs 538.46 µs]
                        thrpt:  [243.42 Melem/s 243.55 Melem/s 243.70 Melem/s]
                 change:
                        time:   [−3.6779% −3.5409% −3.4175%] (p = 0.00 < 0.05)
                        thrpt:  [+3.5385% +3.6709% +3.8183%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild
Inverse f32/PhastFT DIF/262144
                        time:   [1.7227 ms 1.7232 ms 1.7237 ms]
                        thrpt:  [152.08 Melem/s 152.13 Melem/s 152.17 Melem/s]
                 change:
                        time:   [−3.9105% −3.8683% −3.8233%] (p = 0.00 < 0.05)
                        thrpt:  [+3.9753% +4.0239% +4.0696%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
Inverse f32/PhastFT DIT/262144
                        time:   [1.1715 ms 1.1719 ms 1.1723 ms]
                        thrpt:  [223.62 Melem/s 223.70 Melem/s 223.76 Melem/s]
                 change:
                        time:   [−9.5079% −9.4678% −9.4279%] (p = 0.00 < 0.05)
                        thrpt:  [+10.409% +10.458% +10.507%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
Inverse f32/PhastFT DIF/524288
                        time:   [3.4742 ms 3.4752 ms 3.4763 ms]
                        thrpt:  [150.82 Melem/s 150.86 Melem/s 150.91 Melem/s]
                 change:
                        time:   [−4.3495% −4.3123% −4.2755%] (p = 0.00 < 0.05)
                        thrpt:  [+4.4664% +4.5067% +4.5473%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe
Inverse f32/PhastFT DIT/524288
                        time:   [2.3658 ms 2.3671 ms 2.3685 ms]
                        thrpt:  [221.36 Melem/s 221.49 Melem/s 221.61 Melem/s]
                 change:
                        time:   [−7.1002% −7.0433% −6.9806%] (p = 0.00 < 0.05)
                        thrpt:  [+7.5045% +7.5770% +7.6428%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
Inverse f32/PhastFT DIF/1048576
                        time:   [6.9308 ms 6.9349 ms 6.9396 ms]
                        thrpt:  [151.10 Melem/s 151.20 Melem/s 151.29 Melem/s]
                 change:
                        time:   [−1.6681% −1.6019% −1.5264%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5500% +1.6280% +1.6964%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
Inverse f32/PhastFT DIT/1048576
                        time:   [5.1240 ms 5.1276 ms 5.1316 ms]
                        thrpt:  [204.34 Melem/s 204.50 Melem/s 204.64 Melem/s]
                 change:
                        time:   [−1.0442% −0.9154% −0.8044%] (p = 0.00 < 0.05)
                        thrpt:  [+0.8109% +0.9238% +1.0553%]
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
Inverse f32/PhastFT DIF/2097152
                        time:   [15.441 ms 15.475 ms 15.509 ms]
                        thrpt:  [135.22 Melem/s 135.52 Melem/s 135.82 Melem/s]
                 change:
                        time:   [+4.2875% +4.4960% +4.7488%] (p = 0.00 < 0.05)
                        thrpt:  [−4.5335% −4.3026% −4.1112%]
                        Performance has regressed.
Inverse f32/PhastFT DIT/2097152
                        time:   [10.218 ms 10.226 ms 10.235 ms]
                        thrpt:  [204.90 Melem/s 205.08 Melem/s 205.25 Melem/s]
                 change:
                        time:   [−8.1122% −7.9586% −7.8073%] (p = 0.00 < 0.05)
                        thrpt:  [+8.4685% +8.6468% +8.8284%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking Inverse f32/PhastFT DIF/4194304: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, or reduce sample count to 70.
Inverse f32/PhastFT DIF/4194304
                        time:   [41.400 ms 41.877 ms 42.359 ms]
                        thrpt:  [99.019 Melem/s 100.16 Melem/s 101.31 Melem/s]
                 change:
                        time:   [−0.6047% +1.0345% +2.6284%] (p = 0.21 > 0.05)
                        thrpt:  [−2.5611% −1.0239% +0.6084%]
                        No change in performance detected.
Benchmarking Inverse f32/PhastFT DIT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.0s, or reduce sample count to 90.
Inverse f32/PhastFT DIT/4194304
                        time:   [22.875 ms 23.326 ms 23.771 ms]
                        thrpt:  [176.45 Melem/s 179.82 Melem/s 183.36 Melem/s]
                 change:
                        time:   [−17.237% −15.161% −12.928%] (p = 0.00 < 0.05)
                        thrpt:  [+14.848% +17.870% +20.827%]
                        Performance has improved.
Benchmarking Inverse f32/PhastFT DIF/8388608: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.0s, or reduce sample count to 20.
Inverse f32/PhastFT DIF/8388608
                        time:   [94.415 ms 94.973 ms 95.519 ms]
                        thrpt:  [87.821 Melem/s 88.326 Melem/s 88.848 Melem/s]
                 change:
                        time:   [+2.7216% +3.5137% +4.3018%] (p = 0.00 < 0.05)
                        thrpt:  [−4.1244% −3.3944% −2.6495%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking Inverse f32/PhastFT DIT/8388608: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 11.8s, or reduce sample count to 40.
Inverse f32/PhastFT DIT/8388608
                        time:   [49.794 ms 50.186 ms 50.607 ms]
                        thrpt:  [165.76 Melem/s 167.15 Melem/s 168.47 Melem/s]
                 change:
                        time:   [−29.428% −28.828% −28.101%] (p = 0.00 < 0.05)
                        thrpt:  [+39.085% +40.505% +41.699%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe
Benchmarking Inverse f32/PhastFT DIF/16777216: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 33.9s, or reduce sample count to 10.
Inverse f32/PhastFT DIF/16777216
                        time:   [222.34 ms 223.03 ms 223.72 ms]
                        thrpt:  [74.993 Melem/s 75.225 Melem/s 75.458 Melem/s]
                 change:
                        time:   [−39.209% −38.998% −38.767%] (p = 0.00 < 0.05)
                        thrpt:  [+63.310% +63.929% +64.497%]
                        Performance has improved.
Benchmarking Inverse f32/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 23.0s, or reduce sample count to 20.
Inverse f32/PhastFT DIT/16777216
                        time:   [112.82 ms 113.28 ms 113.81 ms]
                        thrpt:  [147.41 Melem/s 148.10 Melem/s 148.71 Melem/s]
                 change:
                        time:   [−35.556% −35.215% −34.810%] (p = 0.00 < 0.05)
                        thrpt:  [+53.399% +54.356% +55.173%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

Forward f64/PhastFT DIF/64
                        time:   [351.97 ns 352.27 ns 352.50 ns]
                        thrpt:  [181.56 Melem/s 181.68 Melem/s 181.83 Melem/s]
                 change:
                        time:   [+0.5832% +0.6850% +0.7897%] (p = 0.00 < 0.05)
                        thrpt:  [−0.7835% −0.6803% −0.5799%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
Forward f64/PhastFT DIT/64
                        time:   [224.48 ns 224.60 ns 224.71 ns]
                        thrpt:  [284.81 Melem/s 284.95 Melem/s 285.10 Melem/s]
                 change:
                        time:   [+2.6148% +2.7498% +2.8920%] (p = 0.00 < 0.05)
                        thrpt:  [−2.8108% −2.6762% −2.5481%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
Forward f64/PhastFT DIF/128
                        time:   [548.89 ns 549.26 ns 549.64 ns]
                        thrpt:  [232.88 Melem/s 233.04 Melem/s 233.20 Melem/s]
                 change:
                        time:   [−4.0394% −3.9427% −3.8451%] (p = 0.00 < 0.05)
                        thrpt:  [+3.9989% +4.1045% +4.2095%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) low mild
Forward f64/PhastFT DIT/128
                        time:   [325.63 ns 325.84 ns 326.04 ns]
                        thrpt:  [392.58 Melem/s 392.83 Melem/s 393.08 Melem/s]
                 change:
                        time:   [+1.6320% +1.7780% +1.9161%] (p = 0.00 < 0.05)
                        thrpt:  [−1.8801% −1.7469% −1.6058%]
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
Forward f64/PhastFT DIF/256
                        time:   [1.0207 µs 1.0213 µs 1.0220 µs]
                        thrpt:  [250.50 Melem/s 250.65 Melem/s 250.81 Melem/s]
                 change:
                        time:   [−7.9378% −7.8305% −7.7297%] (p = 0.00 < 0.05)
                        thrpt:  [+8.3772% +8.4957% +8.6223%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
Forward f64/PhastFT DIT/256
                        time:   [683.80 ns 684.13 ns 684.51 ns]
                        thrpt:  [373.99 Melem/s 374.20 Melem/s 374.38 Melem/s]
                 change:
                        time:   [+0.0133% +0.2344% +0.4299%] (p = 0.03 < 0.05)
                        thrpt:  [−0.4280% −0.2339% −0.0133%]
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) low severe
  3 (3.00%) low mild
  5 (5.00%) high mild
Forward f64/PhastFT DIF/512
                        time:   [2.5687 µs 2.5703 µs 2.5718 µs]
                        thrpt:  [199.08 Melem/s 199.20 Melem/s 199.32 Melem/s]
                 change:
                        time:   [−3.8500% −3.7190% −3.5978%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7320% +3.8627% +4.0041%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
Forward f64/PhastFT DIT/512
                        time:   [1.9968 µs 1.9986 µs 2.0002 µs]
                        thrpt:  [255.98 Melem/s 256.18 Melem/s 256.41 Melem/s]
                 change:
                        time:   [−1.7362% −1.5760% −1.4133%] (p = 0.00 < 0.05)
                        thrpt:  [+1.4336% +1.6012% +1.7669%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low severe
Forward f64/PhastFT DIF/1024
                        time:   [5.5071 µs 5.5105 µs 5.5140 µs]
                        thrpt:  [185.71 Melem/s 185.83 Melem/s 185.94 Melem/s]
                 change:
                        time:   [−0.5133% −0.3822% −0.2413%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2419% +0.3837% +0.5159%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild
Forward f64/PhastFT DIT/1024
                        time:   [4.4760 µs 4.4802 µs 4.4840 µs]
                        thrpt:  [228.37 Melem/s 228.56 Melem/s 228.78 Melem/s]
                 change:
                        time:   [−0.8997% −0.7149% −0.5461%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5491% +0.7200% +0.9079%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
Forward f64/PhastFT DIF/2048
                        time:   [7.0814 µs 7.0842 µs 7.0874 µs]
                        thrpt:  [288.96 Melem/s 289.09 Melem/s 289.21 Melem/s]
                 change:
                        time:   [−0.8718% −0.6245% −0.3743%] (p = 0.00 < 0.05)
                        thrpt:  [+0.3757% +0.6285% +0.8795%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
Forward f64/PhastFT DIT/2048
                        time:   [4.7889 µs 4.7925 µs 4.7960 µs]
                        thrpt:  [427.02 Melem/s 427.33 Melem/s 427.65 Melem/s]
                 change:
                        time:   [−5.2561% −4.9725% −4.7172%] (p = 0.00 < 0.05)
                        thrpt:  [+4.9508% +5.2326% +5.5477%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
Forward f64/PhastFT DIF/4096
                        time:   [15.492 µs 15.543 µs 15.608 µs]
                        thrpt:  [262.44 Melem/s 263.53 Melem/s 264.39 Melem/s]
                 change:
                        time:   [+7.3995% +9.2304% +10.942%] (p = 0.00 < 0.05)
                        thrpt:  [−9.8630% −8.4504% −6.8897%]
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe
Forward f64/PhastFT DIT/4096
                        time:   [10.814 µs 10.839 µs 10.872 µs]
                        thrpt:  [376.73 Melem/s 377.88 Melem/s 378.78 Melem/s]
                 change:
                        time:   [+2.6930% +3.8904% +5.2058%] (p = 0.00 < 0.05)
                        thrpt:  [−4.9482% −3.7447% −2.6224%]
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
Forward f64/PhastFT DIF/8192
                        time:   [38.983 µs 39.000 µs 39.021 µs]
                        thrpt:  [209.94 Melem/s 210.05 Melem/s 210.14 Melem/s]
                 change:
                        time:   [−1.2276% −1.0557% −0.8839%] (p = 0.00 < 0.05)
                        thrpt:  [+0.8918% +1.0669% +1.2429%]
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Forward f64/PhastFT DIT/8192
                        time:   [30.198 µs 30.217 µs 30.241 µs]
                        thrpt:  [270.89 Melem/s 271.10 Melem/s 271.28 Melem/s]
                 change:
                        time:   [−3.0249% −2.6665% −2.2623%] (p = 0.00 < 0.05)
                        thrpt:  [+2.3146% +2.7395% +3.1192%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
Forward f64/PhastFT DIF/16384
                        time:   [99.881 µs 99.917 µs 99.949 µs]
                        thrpt:  [163.92 Melem/s 163.98 Melem/s 164.04 Melem/s]
                 change:
                        time:   [−0.8065% −0.5979% −0.3551%] (p = 0.00 < 0.05)
                        thrpt:  [+0.3563% +0.6015% +0.8131%]
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  11 (11.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
Forward f64/PhastFT DIT/16384
                        time:   [83.966 µs 84.076 µs 84.169 µs]
                        thrpt:  [194.66 Melem/s 194.87 Melem/s 195.13 Melem/s]
                 change:
                        time:   [−2.3473% −2.0532% −1.7496%] (p = 0.00 < 0.05)
                        thrpt:  [+1.7808% +2.0963% +2.4037%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
Forward f64/PhastFT DIF/32768
                        time:   [228.24 µs 228.36 µs 228.49 µs]
                        thrpt:  [143.41 Melem/s 143.50 Melem/s 143.57 Melem/s]
                 change:
                        time:   [−2.2832% −2.1319% −1.9914%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0318% +2.1783% +2.3365%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
Forward f64/PhastFT DIT/32768
                        time:   [197.87 µs 197.95 µs 198.03 µs]
                        thrpt:  [165.47 Melem/s 165.54 Melem/s 165.60 Melem/s]
                 change:
                        time:   [−3.1444% −3.0174% −2.8962%] (p = 0.00 < 0.05)
                        thrpt:  [+2.9826% +3.1113% +3.2465%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
Forward f64/PhastFT DIF/65536
                        time:   [488.46 µs 488.64 µs 488.84 µs]
                        thrpt:  [134.07 Melem/s 134.12 Melem/s 134.17 Melem/s]
                 change:
                        time:   [−1.4689% −1.3202% −1.1791%] (p = 0.00 < 0.05)
                        thrpt:  [+1.1932% +1.3379% +1.4908%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) low severe
  8 (8.00%) low mild
  1 (1.00%) high mild
Forward f64/PhastFT DIT/65536
                        time:   [416.95 µs 417.14 µs 417.34 µs]
                        thrpt:  [157.03 Melem/s 157.11 Melem/s 157.18 Melem/s]
                 change:
                        time:   [−5.5571% −5.3823% −5.2227%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5105% +5.6885% +5.8840%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
Benchmarking Forward f64/PhastFT DIF/131072: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.9s, enable flat sampling, or reduce sample count to 50.
Forward f64/PhastFT DIF/131072
                        time:   [1.0553 ms 1.0567 ms 1.0581 ms]
                        thrpt:  [123.88 Melem/s 124.03 Melem/s 124.20 Melem/s]
                 change:
                        time:   [−8.7043% −8.5273% −8.3838%] (p = 0.00 < 0.05)
                        thrpt:  [+9.1510% +9.3222% +9.5342%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  8 (8.00%) low mild
Benchmarking Forward f64/PhastFT DIT/131072: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
Forward f64/PhastFT DIT/131072
                        time:   [919.41 µs 920.50 µs 921.54 µs]
                        thrpt:  [142.23 Melem/s 142.39 Melem/s 142.56 Melem/s]
                 change:
                        time:   [−16.421% −16.322% −16.230%] (p = 0.00 < 0.05)
                        thrpt:  [+19.375% +19.506% +19.647%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Forward f64/PhastFT DIF/262144
                        time:   [2.2023 ms 2.2031 ms 2.2041 ms]
                        thrpt:  [118.94 Melem/s 118.99 Melem/s 119.03 Melem/s]
                 change:
                        time:   [−8.9696% −8.9125% −8.8566%] (p = 0.00 < 0.05)
                        thrpt:  [+9.7172% +9.7846% +9.8534%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Forward f64/PhastFT DIT/262144
                        time:   [1.8719 ms 1.8724 ms 1.8729 ms]
                        thrpt:  [139.97 Melem/s 140.01 Melem/s 140.05 Melem/s]
                 change:
                        time:   [−12.265% −12.225% −12.188%] (p = 0.00 < 0.05)
                        thrpt:  [+13.879% +13.928% +13.979%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
Forward f64/PhastFT DIF/524288
                        time:   [4.3439 ms 4.3458 ms 4.3478 ms]
                        thrpt:  [120.59 Melem/s 120.64 Melem/s 120.70 Melem/s]
                 change:
                        time:   [−4.6281% −4.5485% −4.4693%] (p = 0.00 < 0.05)
                        thrpt:  [+4.6784% +4.7653% +4.8527%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
Forward f64/PhastFT DIT/524288
                        time:   [4.0808 ms 4.0826 ms 4.0848 ms]
                        thrpt:  [128.35 Melem/s 128.42 Melem/s 128.48 Melem/s]
                 change:
                        time:   [−5.2492% −5.1707% −5.0902%] (p = 0.00 < 0.05)
                        thrpt:  [+5.3632% +5.4526% +5.5400%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe
Forward f64/PhastFT DIF/1048576
                        time:   [10.057 ms 10.067 ms 10.079 ms]
                        thrpt:  [104.04 Melem/s 104.16 Melem/s 104.27 Melem/s]
                 change:
                        time:   [−2.9829% −2.5768% −2.1650%] (p = 0.00 < 0.05)
                        thrpt:  [+2.2129% +2.6450% +3.0746%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Forward f64/PhastFT DIT/1048576
                        time:   [8.4189 ms 8.4252 ms 8.4324 ms]
                        thrpt:  [124.35 Melem/s 124.46 Melem/s 124.55 Melem/s]
                 change:
                        time:   [−9.3968% −9.2682% −9.1499%] (p = 0.00 < 0.05)
                        thrpt:  [+10.071% +10.215% +10.371%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
Benchmarking Forward f64/PhastFT DIF/2097152: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, or reduce sample count to 90.
Forward f64/PhastFT DIF/2097152
                        time:   [32.915 ms 33.057 ms 33.202 ms]
                        thrpt:  [63.164 Melem/s 63.441 Melem/s 63.714 Melem/s]
                 change:
                        time:   [+0.0133% +0.6835% +1.3079%] (p = 0.04 < 0.05)
                        thrpt:  [−1.2910% −0.6789% −0.0133%]
                        Change within noise threshold.
Forward f64/PhastFT DIT/2097152
                        time:   [22.255 ms 22.275 ms 22.296 ms]
                        thrpt:  [94.062 Melem/s 94.147 Melem/s 94.231 Melem/s]
                 change:
                        time:   [−20.516% −19.579% −18.611%] (p = 0.00 < 0.05)
                        thrpt:  [+22.867% +24.345% +25.811%]
                        Performance has improved.
Benchmarking Forward f64/PhastFT DIF/4194304: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 13.5s, or reduce sample count to 30.
Forward f64/PhastFT DIF/4194304
                        time:   [69.802 ms 70.123 ms 70.471 ms]
                        thrpt:  [59.518 Melem/s 59.813 Melem/s 60.089 Melem/s]
                 change:
                        time:   [+1.1439% +1.8593% +2.5155%] (p = 0.00 < 0.05)
                        thrpt:  [−2.4538% −1.8254% −1.1310%]
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe
Benchmarking Forward f64/PhastFT DIT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.0s, or reduce sample count to 50.
Forward f64/PhastFT DIT/4194304
                        time:   [41.892 ms 42.238 ms 42.610 ms]
                        thrpt:  [98.434 Melem/s 99.301 Melem/s 100.12 Melem/s]
                 change:
                        time:   [−33.451% −32.821% −32.181%] (p = 0.00 < 0.05)
                        thrpt:  [+47.450% +48.857% +50.266%]
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  9 (9.00%) high mild
  9 (9.00%) high severe
Benchmarking Forward f64/PhastFT DIF/8388608: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 27.6s, or reduce sample count to 10.
Forward f64/PhastFT DIF/8388608
                        time:   [174.53 ms 175.04 ms 175.58 ms]
                        thrpt:  [47.777 Melem/s 47.924 Melem/s 48.063 Melem/s]
                 change:
                        time:   [−33.523% −33.290% −33.063%] (p = 0.00 < 0.05)
                        thrpt:  [+49.394% +49.903% +50.428%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking Forward f64/PhastFT DIT/8388608: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.5s, or reduce sample count to 20.
Forward f64/PhastFT DIT/8388608
                        time:   [96.441 ms 97.100 ms 97.765 ms]
                        thrpt:  [85.804 Melem/s 86.392 Melem/s 86.982 Melem/s]
                 change:
                        time:   [−37.517% −37.078% −36.584%] (p = 0.00 < 0.05)
                        thrpt:  [+57.690% +58.926% +60.045%]
                        Performance has improved.
Benchmarking Forward f64/PhastFT DIF/16777216: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 100.9s, or reduce sample count to 10.
Forward f64/PhastFT DIF/16777216
                        time:   [809.53 ms 810.21 ms 811.03 ms]
                        thrpt:  [20.686 Melem/s 20.707 Melem/s 20.725 Melem/s]
                 change:
                        time:   [+1.5983% +1.8346% +2.0706%] (p = 0.00 < 0.05)
                        thrpt:  [−2.0286% −1.8015% −1.5731%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
Benchmarking Forward f64/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 69.2s, or reduce sample count to 10.
Forward f64/PhastFT DIT/16777216
                        time:   [491.39 ms 492.08 ms 492.81 ms]
                        thrpt:  [34.044 Melem/s 34.095 Melem/s 34.142 Melem/s]
                 change:
                        time:   [−11.989% −11.825% −11.657%] (p = 0.00 < 0.05)
                        thrpt:  [+13.195% +13.411% +13.623%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Inverse f64/PhastFT DIF/64
                        time:   [369.68 ns 371.00 ns 372.32 ns]
                        thrpt:  [171.90 Melem/s 172.51 Melem/s 173.12 Melem/s]
                 change:
                        time:   [+1.1146% +1.4734% +1.8685%] (p = 0.00 < 0.05)
                        thrpt:  [−1.8342% −1.4521% −1.1023%]
                        Performance has regressed.
Inverse f64/PhastFT DIT/64
                        time:   [244.79 ns 245.06 ns 245.33 ns]
                        thrpt:  [260.87 Melem/s 261.16 Melem/s 261.45 Melem/s]
                 change:
                        time:   [+3.3424% +3.5165% +3.6936%] (p = 0.00 < 0.05)
                        thrpt:  [−3.5620% −3.3971% −3.2343%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
Inverse f64/PhastFT DIF/128
                        time:   [545.69 ns 547.04 ns 548.36 ns]
                        thrpt:  [233.43 Melem/s 233.99 Melem/s 234.57 Melem/s]
                 change:
                        time:   [+0.3229% +0.5678% +0.8242%] (p = 0.00 < 0.05)
                        thrpt:  [−0.8175% −0.5646% −0.3219%]
                        Change within noise threshold.
Inverse f64/PhastFT DIT/128
                        time:   [347.74 ns 348.03 ns 348.33 ns]
                        thrpt:  [367.46 Melem/s 367.78 Melem/s 368.09 Melem/s]
                 change:
                        time:   [+0.9644% +1.1422% +1.3237%] (p = 0.00 < 0.05)
                        thrpt:  [−1.3064% −1.1293% −0.9551%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
Inverse f64/PhastFT DIF/256
                        time:   [1.0840 µs 1.0851 µs 1.0862 µs]
                        thrpt:  [235.69 Melem/s 235.93 Melem/s 236.17 Melem/s]
                 change:
                        time:   [−3.5022% −3.3757% −3.2584%] (p = 0.00 < 0.05)
                        thrpt:  [+3.3682% +3.4936% +3.6293%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
Inverse f64/PhastFT DIT/256
                        time:   [723.19 ns 724.10 ns 724.97 ns]
                        thrpt:  [353.12 Melem/s 353.54 Melem/s 353.99 Melem/s]
                 change:
                        time:   [−0.0431% +0.1648% +0.3654%] (p = 0.11 > 0.05)
                        thrpt:  [−0.3640% −0.1645% +0.0431%]
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
Inverse f64/PhastFT DIF/512
                        time:   [2.6903 µs 2.6922 µs 2.6941 µs]
                        thrpt:  [190.04 Melem/s 190.18 Melem/s 190.31 Melem/s]
                 change:
                        time:   [−0.9903% −0.8439% −0.7073%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7123% +0.8511% +1.0002%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high severe
Inverse f64/PhastFT DIT/512
                        time:   [2.0929 µs 2.0953 µs 2.0975 µs]
                        thrpt:  [244.10 Melem/s 244.36 Melem/s 244.63 Melem/s]
                 change:
                        time:   [−1.0498% −0.9010% −0.7591%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7649% +0.9092% +1.0609%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
Inverse f64/PhastFT DIF/1024
                        time:   [5.6957 µs 5.6995 µs 5.7036 µs]
                        thrpt:  [179.54 Melem/s 179.66 Melem/s 179.79 Melem/s]
                 change:
                        time:   [+0.8581% +1.0069% +1.1600%] (p = 0.00 < 0.05)
                        thrpt:  [−1.1467% −0.9969% −0.8508%]
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) low severe
  5 (5.00%) high mild
  1 (1.00%) high severe
Inverse f64/PhastFT DIT/1024
                        time:   [4.6838 µs 4.6875 µs 4.6911 µs]
                        thrpt:  [218.29 Melem/s 218.45 Melem/s 218.63 Melem/s]
                 change:
                        time:   [−0.3025% −0.1201% +0.0610%] (p = 0.20 > 0.05)
                        thrpt:  [−0.0610% +0.1202% +0.3035%]
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
Inverse f64/PhastFT DIF/2048
                        time:   [7.6107 µs 7.6184 µs 7.6248 µs]
                        thrpt:  [268.60 Melem/s 268.82 Melem/s 269.10 Melem/s]
                 change:
                        time:   [+0.9046% +1.1451% +1.4078%] (p = 0.00 < 0.05)
                        thrpt:  [−1.3883% −1.1322% −0.8965%]
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low severe
Inverse f64/PhastFT DIT/2048
                        time:   [5.2829 µs 5.2934 µs 5.3043 µs]
                        thrpt:  [386.10 Melem/s 386.89 Melem/s 387.66 Melem/s]
                 change:
                        time:   [−4.2373% −3.9450% −3.6415%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7792% +4.1070% +4.4248%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low severe
Inverse f64/PhastFT DIF/4096
                        time:   [15.946 µs 15.960 µs 15.979 µs]
                        thrpt:  [256.34 Melem/s 256.64 Melem/s 256.87 Melem/s]
                 change:
                        time:   [+3.4076% +4.0030% +4.6164%] (p = 0.00 < 0.05)
                        thrpt:  [−4.4127% −3.8490% −3.2953%]
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe
Inverse f64/PhastFT DIT/4096
                        time:   [11.744 µs 11.769 µs 11.801 µs]
                        thrpt:  [347.10 Melem/s 348.04 Melem/s 348.77 Melem/s]
                 change:
                        time:   [+1.2058% +2.3495% +3.6112%] (p = 0.00 < 0.05)
                        thrpt:  [−3.4853% −2.2955% −1.1914%]
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe
Inverse f64/PhastFT DIF/8192
                        time:   [40.552 µs 40.574 µs 40.603 µs]
                        thrpt:  [201.76 Melem/s 201.90 Melem/s 202.01 Melem/s]
                 change:
                        time:   [+0.8990% +1.0963% +1.2834%] (p = 0.00 < 0.05)
                        thrpt:  [−1.2671% −1.0844% −0.8910%]
                        Change within noise threshold.
Inverse f64/PhastFT DIT/8192
                        time:   [32.366 µs 32.384 µs 32.406 µs]
                        thrpt:  [252.79 Melem/s 252.96 Melem/s 253.11 Melem/s]
                 change:
                        time:   [+0.4750% +0.7616% +1.0617%] (p = 0.00 < 0.05)
                        thrpt:  [−1.0505% −0.7558% −0.4727%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
Inverse f64/PhastFT DIF/16384
                        time:   [104.41 µs 104.45 µs 104.49 µs]
                        thrpt:  [156.80 Melem/s 156.86 Melem/s 156.92 Melem/s]
                 change:
                        time:   [+0.7729% +0.9865% +1.1997%] (p = 0.00 < 0.05)
                        thrpt:  [−1.1854% −0.9768% −0.7670%]
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high severe
Inverse f64/PhastFT DIT/16384
                        time:   [87.410 µs 87.462 µs 87.506 µs]
                        thrpt:  [187.23 Melem/s 187.33 Melem/s 187.44 Melem/s]
                 change:
                        time:   [−2.4083% −2.1715% −1.9524%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9913% +2.2197% +2.4677%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  7 (7.00%) low mild
Inverse f64/PhastFT DIF/32768
                        time:   [237.35 µs 237.42 µs 237.50 µs]
                        thrpt:  [137.97 Melem/s 138.02 Melem/s 138.06 Melem/s]
                 change:
                        time:   [−1.1354% −1.0107% −0.8873%] (p = 0.00 < 0.05)
                        thrpt:  [+0.8952% +1.0210% +1.1485%]
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  8 (8.00%) low severe
  6 (6.00%) low mild
  1 (1.00%) high mild
Inverse f64/PhastFT DIT/32768
                        time:   [206.90 µs 206.98 µs 207.08 µs]
                        thrpt:  [158.24 Melem/s 158.31 Melem/s 158.37 Melem/s]
                 change:
                        time:   [−3.5341% −3.4080% −3.2904%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4024% +3.5282% +3.6636%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
Inverse f64/PhastFT DIF/65536
                        time:   [505.44 µs 505.57 µs 505.69 µs]
                        thrpt:  [129.60 Melem/s 129.63 Melem/s 129.66 Melem/s]
                 change:
                        time:   [−1.5079% −1.3294% −1.1519%] (p = 0.00 < 0.05)
                        thrpt:  [+1.1653% +1.3474% +1.5310%]
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  15 (15.00%) low severe
  4 (4.00%) low mild
Inverse f64/PhastFT DIT/65536
                        time:   [432.76 µs 432.96 µs 433.20 µs]
                        thrpt:  [151.28 Melem/s 151.37 Melem/s 151.44 Melem/s]
                 change:
                        time:   [−4.9837% −4.7949% −4.6279%] (p = 0.00 < 0.05)
                        thrpt:  [+4.8525% +5.0364% +5.2451%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  7 (7.00%) low mild
Benchmarking Inverse f64/PhastFT DIF/131072: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.7s, enable flat sampling, or reduce sample count to 50.
Inverse f64/PhastFT DIF/131072
                        time:   [1.0865 ms 1.0877 ms 1.0888 ms]
                        thrpt:  [120.38 Melem/s 120.51 Melem/s 120.64 Melem/s]
                 change:
                        time:   [−7.6894% −7.5610% −7.4559%] (p = 0.00 < 0.05)
                        thrpt:  [+8.0566% +8.1795% +8.3299%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  6 (6.00%) low mild
Benchmarking Inverse f64/PhastFT DIT/131072: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
Inverse f64/PhastFT DIT/131072
                        time:   [947.09 µs 948.15 µs 949.09 µs]
                        thrpt:  [138.10 Melem/s 138.24 Melem/s 138.39 Melem/s]
                 change:
                        time:   [−17.242% −17.158% −17.075%] (p = 0.00 < 0.05)
                        thrpt:  [+20.592% +20.712% +20.835%]
                        Performance has improved.
Inverse f64/PhastFT DIF/262144
                        time:   [2.2581 ms 2.2587 ms 2.2593 ms]
                        thrpt:  [116.03 Melem/s 116.06 Melem/s 116.09 Melem/s]
                 change:
                        time:   [−8.9809% −8.9451% −8.9093%] (p = 0.00 < 0.05)
                        thrpt:  [+9.7806% +9.8239% +9.8671%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Inverse f64/PhastFT DIT/262144
                        time:   [1.9313 ms 1.9321 ms 1.9329 ms]
                        thrpt:  [135.62 Melem/s 135.68 Melem/s 135.74 Melem/s]
                 change:
                        time:   [−12.068% −12.025% −11.979%] (p = 0.00 < 0.05)
                        thrpt:  [+13.609% +13.669% +13.724%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
Inverse f64/PhastFT DIF/524288
                        time:   [4.4505 ms 4.4519 ms 4.4534 ms]
                        thrpt:  [117.73 Melem/s 117.77 Melem/s 117.80 Melem/s]
                 change:
                        time:   [−3.8108% −3.7590% −3.7097%] (p = 0.00 < 0.05)
                        thrpt:  [+3.8526% +3.9058% +3.9618%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Inverse f64/PhastFT DIT/524288
                        time:   [4.2177 ms 4.2190 ms 4.2205 ms]
                        thrpt:  [124.22 Melem/s 124.27 Melem/s 124.31 Melem/s]
                 change:
                        time:   [−5.5378% −5.4188% −5.3228%] (p = 0.00 < 0.05)
                        thrpt:  [+5.6221% +5.7292% +5.8624%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
Inverse f64/PhastFT DIF/1048576
                        time:   [10.341 ms 10.369 ms 10.399 ms]
                        thrpt:  [100.83 Melem/s 101.12 Melem/s 101.40 Melem/s]
                 change:
                        time:   [+1.5980% +1.9201% +2.2118%] (p = 0.00 < 0.05)
                        thrpt:  [−2.1639% −1.8839% −1.5729%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Inverse f64/PhastFT DIT/1048576
                        time:   [8.6373 ms 8.6423 ms 8.6478 ms]
                        thrpt:  [121.25 Melem/s 121.33 Melem/s 121.40 Melem/s]
                 change:
                        time:   [−9.4505% −9.3152% −9.1859%] (p = 0.00 < 0.05)
                        thrpt:  [+10.115% +10.272% +10.437%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking Inverse f64/PhastFT DIF/2097152: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, or reduce sample count to 90.
Inverse f64/PhastFT DIF/2097152
                        time:   [33.181 ms 33.256 ms 33.332 ms]
                        thrpt:  [62.917 Melem/s 63.061 Melem/s 63.203 Melem/s]
                 change:
                        time:   [−1.6936% −1.2008% −0.6959%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7008% +1.2154% +1.7228%]
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Inverse f64/PhastFT DIT/2097152
                        time:   [22.866 ms 22.883 ms 22.900 ms]
                        thrpt:  [91.577 Melem/s 91.648 Melem/s 91.716 Melem/s]
                 change:
                        time:   [−20.919% −19.944% −18.939%] (p = 0.00 < 0.05)
                        thrpt:  [+23.364% +24.912% +26.453%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
Benchmarking Inverse f64/PhastFT DIF/4194304: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 13.7s, or reduce sample count to 30.
Inverse f64/PhastFT DIF/4194304
                        time:   [74.026 ms 74.580 ms 75.140 ms]
                        thrpt:  [55.820 Melem/s 56.239 Melem/s 56.660 Melem/s]
                 change:
                        time:   [+1.6842% +2.5966% +3.5024%] (p = 0.00 < 0.05)
                        thrpt:  [−3.3839% −2.5309% −1.6563%]
                        Performance has regressed.
Benchmarking Inverse f64/PhastFT DIT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.1s, or reduce sample count to 40.
Inverse f64/PhastFT DIT/4194304
                        time:   [44.868 ms 45.215 ms 45.563 ms]
                        thrpt:  [92.055 Melem/s 92.764 Melem/s 93.482 Melem/s]
                 change:
                        time:   [−31.134% −30.463% −29.877%] (p = 0.00 < 0.05)
                        thrpt:  [+42.606% +43.809% +45.209%]
                        Performance has improved.
Benchmarking Inverse f64/PhastFT DIF/8388608: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 26.4s, or reduce sample count to 10.
Inverse f64/PhastFT DIF/8388608
                        time:   [167.05 ms 167.58 ms 168.10 ms]
                        thrpt:  [49.903 Melem/s 50.059 Melem/s 50.215 Melem/s]
                 change:
                        time:   [−39.284% −39.043% −38.790%] (p = 0.00 < 0.05)
                        thrpt:  [+63.373% +64.049% +64.702%]
                        Performance has improved.
Benchmarking Inverse f64/PhastFT DIT/8388608: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 20.0s, or reduce sample count to 20.
Inverse f64/PhastFT DIT/8388608
                        time:   [102.34 ms 102.96 ms 103.60 ms]
                        thrpt:  [80.969 Melem/s 81.476 Melem/s 81.970 Melem/s]
                 change:
                        time:   [−37.137% −36.699% −36.254%] (p = 0.00 < 0.05)
                        thrpt:  [+56.872% +57.974% +59.076%]
                        Performance has improved.
Benchmarking Inverse f64/PhastFT DIF/16777216: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 101.7s, or reduce sample count to 10.
Inverse f64/PhastFT DIF/16777216
                        time:   [824.18 ms 825.89 ms 827.64 ms]
                        thrpt:  [20.271 Melem/s 20.314 Melem/s 20.356 Melem/s]
                 change:
                        time:   [−0.1820% +0.1634% +0.4899%] (p = 0.34 > 0.05)
                        thrpt:  [−0.4875% −0.1632% +0.1823%]
                        No change in performance detected.
Benchmarking Inverse f64/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 71.7s, or reduce sample count to 10.
Inverse f64/PhastFT DIT/16777216
                        time:   [517.82 ms 519.23 ms 520.68 ms]
                        thrpt:  [32.222 Melem/s 32.312 Melem/s 32.400 Melem/s]
                 change:
                        time:   [−10.951% −10.669% −10.386%] (p = 0.00 < 0.05)
                        thrpt:  [+11.590% +11.943% +12.298%]
                        Performance has improved.

@smu160
Copy link
Member Author

smu160 commented Nov 18, 2025

I saw it get up to ~19% better on an m2 macbook air. Those are great results on zen4!

@Shnatsel
Copy link
Collaborator

Shnatsel commented Nov 18, 2025

An odd artifact of this is that DiF performance has improved by a lot for some sizes too. For example, Inverse f64/PhastFT DIF/8388608 shows a massive improvement.

But this PR doesn't seem to have touched DiF at all. Is there some shared code between the two? If not, that result is quite confusing.

@smu160
Copy link
Member Author

smu160 commented Nov 19, 2025

That is odd. I do see some variation in DIF FFT, and even RustFFT times/throughput when running criterion benchmarks. I always attributed that to noise. I'm also using apple silicon, so I'm not sure what to make of it.

@Shnatsel Shnatsel merged commit 7dc39b1 into main Nov 23, 2025
8 checks passed
@Shnatsel Shnatsel deleted the feature/cache-blocked-dit-fft branch November 23, 2025 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants