Skip to content

Conversation

@GrigoryEvko
Copy link

@GrigoryEvko GrigoryEvko commented Nov 16, 2025

This PR adds and mul_sub() methods to the SimdFloat trait. This operation compute (self * a) - b respectively, using a single fused multiply-add instruction where available.

GrigoryEvko and others added 2 commits November 16, 2025 23:49
Implements fused multiply-add (FMA) operations for SimdFloat, the rust-lang#1 most
critical missing feature in portable-simd based on analysis of PyTorch's
SIMD implementation.

Methods:
- mul_add(a, b) - computes (self * a) + b with single rounding
- mul_sub(a, b) - computes (self * a) - b with single rounding

Benefits:
- Improved accuracy: single rounding error vs two separate roundings
- Better performance: 2 operations in 1 instruction on modern CPUs
- Universal hardware support: FMA3 (x86), NEON vfma (ARM), RISC-V F extension

Implementation:
- Delegates to core::intrinsics::simd::simd_fma LLVM intrinsic
- Zero-cost abstraction with #[inline]
- mul_sub implemented as mul_add(a, -b)

Testing (14 tests):
- 3 accuracy tests proving FMA superiority:
  * Catastrophic cancellation: (1+ε)(1-ε) - 1
  * Discriminant calculation: b² - 4ac (quadratic formula)
  * Polynomial evaluation with Horner's method
- Basic operations (f32x4, f64x4, mul_add, mul_sub)
- Special values (infinity, NaN, MAX, MIN, subnormals)
- Size variations (f32x2, f32x8)
- Negative values

Example demonstrates:
- Basic FMA usage
- Polynomial evaluation (Horner's method)
- Dot product accumulation
- Accuracy comparison

Use cases:
- Neural networks (dot products, matrix multiply)
- Scientific computing (polynomial evaluation, numerical stability)
- Graphics (lighting calculations, transformations)
- Physics simulations (force calculations, integration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
ARM NEON uses flush-to-zero (FTZ) for subnormal values in SIMD operations.
Updated test to accept either the correct subnormal result or zero.
@GrigoryEvko GrigoryEvko changed the title Add mul_add() and mul_sub() fused multiply-add operations Add mul_sub() fused multiply-add operation Nov 16, 2025
StdFloat already provides mul_add. This PR now only adds mul_sub.
@GrigoryEvko
Copy link
Author

Maybe move mul_add from std_float::StdFloat to core_simd/src/simd/num/float.rs as I accidentaly tried?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant