Add mul_sub() fused multiply-add operation #492

GrigoryEvko · 2025-11-16T20:56:34Z

This PR adds and mul_sub() methods to the SimdFloat trait. This operation compute (self * a) - b respectively, using a single fused multiply-add instruction where available.

Implements fused multiply-add (FMA) operations for SimdFloat, the rust-lang#1 most critical missing feature in portable-simd based on analysis of PyTorch's SIMD implementation. Methods: - mul_add(a, b) - computes (self * a) + b with single rounding - mul_sub(a, b) - computes (self * a) - b with single rounding Benefits: - Improved accuracy: single rounding error vs two separate roundings - Better performance: 2 operations in 1 instruction on modern CPUs - Universal hardware support: FMA3 (x86), NEON vfma (ARM), RISC-V F extension Implementation: - Delegates to core::intrinsics::simd::simd_fma LLVM intrinsic - Zero-cost abstraction with #[inline] - mul_sub implemented as mul_add(a, -b) Testing (14 tests): - 3 accuracy tests proving FMA superiority: * Catastrophic cancellation: (1+ε)(1-ε) - 1 * Discriminant calculation: b² - 4ac (quadratic formula) * Polynomial evaluation with Horner's method - Basic operations (f32x4, f64x4, mul_add, mul_sub) - Special values (infinity, NaN, MAX, MIN, subnormals) - Size variations (f32x2, f32x8) - Negative values Example demonstrates: - Basic FMA usage - Polynomial evaluation (Horner's method) - Dot product accumulation - Accuracy comparison Use cases: - Neural networks (dot products, matrix multiply) - Scientific computing (polynomial evaluation, numerical stability) - Graphics (lighting calculations, transformations) - Physics simulations (force calculations, integration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

ARM NEON uses flush-to-zero (FTZ) for subnormal values in SIMD operations. Updated test to accept either the correct subnormal result or zero.

StdFloat already provides mul_add. This PR now only adds mul_sub.

GrigoryEvko · 2025-11-16T21:28:07Z

Maybe move mul_add from std_float::StdFloat to core_simd/src/simd/num/float.rs as I accidentaly tried?

Both mul_add and mul_sub now live in StdFloat for consistency.

GrigoryEvko and others added 2 commits November 16, 2025 23:49

Fix subnormal value test for ARM NEON FTZ mode

f963261

ARM NEON uses flush-to-zero (FTZ) for subnormal values in SIMD operations. Updated test to accept either the correct subnormal result or zero.

GrigoryEvko changed the title ~~Add mul_add() and mul_sub() fused multiply-add operations~~ Add mul_sub() fused multiply-add operation Nov 16, 2025

GrigoryEvko added 2 commits November 17, 2025 00:25

Remove duplicate mul_add, keep only mul_sub

01683d0

StdFloat already provides mul_add. This PR now only adds mul_sub.

Fix duplicate StdFloat import

1ad5819

GrigoryEvko added 4 commits November 17, 2025 00:32

Move mul_sub to StdFloat trait

50d79e3

Both mul_add and mul_sub now live in StdFloat for consistency.

Move FMA tests and examples to std_float crate

fbe8195

Fix mul_sub to use simd_fneg intrinsic

1acc2a5

Fix clippy warnings and simd_neg usage

cc2430c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mul_sub() fused multiply-add operation #492

Add mul_sub() fused multiply-add operation #492

GrigoryEvko commented Nov 16, 2025 •

edited

Loading

Uh oh!

GrigoryEvko commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add mul_sub() fused multiply-add operation #492

Are you sure you want to change the base?

Add mul_sub() fused multiply-add operation #492

Conversation

GrigoryEvko commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GrigoryEvko commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GrigoryEvko commented Nov 16, 2025 •

edited

Loading