perf: add initial cache-blocked dit fft impl#48
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #48 +/- ##
==========================================
+ Coverage 99.16% 99.29% +0.12%
==========================================
Files 12 12
Lines 2167 2257 +90
==========================================
+ Hits 2149 2241 +92
+ Misses 18 16 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Nice, I'm seeing up to 50%-ish improvements for larger sizes! Timings on my Zen 4 CPU compared to main |
|
I saw it get up to ~19% better on an m2 macbook air. Those are great results on zen4! |
|
An odd artifact of this is that DiF performance has improved by a lot for some sizes too. For example, But this PR doesn't seem to have touched DiF at all. Is there some shared code between the two? If not, that result is quite confusing. |
|
That is odd. I do see some variation in DIF FFT, and even RustFFT times/throughput when running criterion benchmarks. I always attributed that to noise. I'm also using apple silicon, so I'm not sure what to make of it. |
No description provided.