Skip to content

perf: apply multiple-accumulator SIMD optimization to fmean.c (#824)#829

Closed
SebKrantz wants to merge 1 commit into
masterfrom
claude/issue-824-20260513-0346
Closed

perf: apply multiple-accumulator SIMD optimization to fmean.c (#824)#829
SebKrantz wants to merge 1 commit into
masterfrom
claude/issue-824-20260513-0346

Conversation

@SebKrantz
Copy link
Copy Markdown
Member

Extends the loop unrolling optimization from PR #828 (fsum.c) to all ungrouped scalar paths in fmean.c, using FMEAN_N_ACC = 4 independent accumulators to break serial dependency chains and enable SIMD auto-vectorization.

Functions updated:

  • fmean_double_impl: na.rm=TRUE (dual acc/nacc arrays), na.rm=FALSE (acc array)
  • fmean_double_omp_impl: both paths with reduction(+:acc[:N],nacc[:N])
  • fmean_weights_impl: both paths with dual macc/wacc arrays
  • fmean_weights_omp_impl: both paths with array reduction
  • fmean_int_omp_impl: both paths with long long acc arrays

Grouped functions are unchanged — scatter pattern prevents vectorization.

Description

Main Changes

Checklist

  • I have performed a self-review of my code.
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have updated the documentation where applicable.

Additional Context

Extends the loop unrolling optimization from PR #828 (fsum.c) to all
ungrouped scalar paths in fmean.c, using FMEAN_N_ACC = 4 independent
accumulators to break serial dependency chains and enable SIMD
auto-vectorization.

Functions updated:
- fmean_double_impl: na.rm=TRUE (dual acc/nacc arrays), na.rm=FALSE (acc array)
- fmean_double_omp_impl: both paths with reduction(+:acc[:N],nacc[:N])
- fmean_weights_impl: both paths with dual macc/wacc arrays
- fmean_weights_omp_impl: both paths with array reduction
- fmean_int_omp_impl: both paths with long long acc arrays

Grouped functions are unchanged — scatter pattern prevents vectorization.

Co-authored-by: Sebastian Krantz <SebKrantz@users.noreply.github.com>
@SebKrantz SebKrantz closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant