You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Math: FFT: Optimize fft_execute_32() HiFi code version
This patch optimizes the cycle count of the radix-2 Cooley-Tukey
implementation with with three changes:
- Dedicated depth-1 stage: all N/2 butterflies use a real twiddle
factor W^0 = 1+0j, so the complex multiply is replaced by plain
add or subtract.
- Skip multiply for j=0 in stages >= 2: The first butterfly in every
group also uses W^0, saving an additional ~N/2 complex multiplications
across all remaining stages.
- Pointer arithmetic: replace per-butterfly index arithmetic
(outx[k+j], outx[k+j+n], twiddle[i*j]) with auto-incrementing
pointers and strided twiddle access (tw_r += stride), eliminating
integer multiplies for address computation.
This change saves 11 MCPS (from 74 MCPS to 63 MCPS) in STFT Process
module in MTL platform with 1024/256 size/hop FFT processing. It was
tested with scripts:
scripts/rebuild-testbench.sh -p mtl
scripts/sof-testbench-helper.sh -x -m stft_process_1024_256_ \
-p profile-stft_process.txt
Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
0 commit comments