-
Notifications
You must be signed in to change notification settings - Fork 11
Description
This issue is a proposal for how to systematically deal with the tradeoff between precise semantics and performance. This tradeoff affects rounding behavior, especially in multiply-add combinations, and also NaN handling.
The general principle is as follows: the short, ergonomic name is a method with relaxed semantics, optimized for performance. Guarantees are explicitly weaker than, say, the Rust core language.
When applications require it, we will also add a _precise
version, which is guaranteed to match Rust semantics, but may have degraded performance.
Here are more specific details:
mul_add
may in general be implemented as a * b + c
or as fused multiply-add. We'll see the former in x86 versions lower than x86_64-v3, and also in WASM SIMD (until relaxed_simd lands). On aarch64, I'm still researching the situation; both vmla (multiply-add) and vfma (fused multiply-add) instructions exist. The former may be higher performance in some implementations, in which case that will be the choice for mul_add
, but of course the latter for mul_add_precise
.
min
and max
may have different NaN handling. I believe this will affect mostly Intel. The _precise
variants will be polyfilled by a compare and select combination on Intel, and vminnm/vmaxnm
on aarch64. If needed, we can also implement minimum
and maximum
, which are vmin
and vmax
respectively on aarch64.
round
will typically be implemented as round-ties-even semantics, but that's not guaranteed. If we need more precision, round_ties_even
will guarantee ties-even semantics, and, if needed, round_precise
will guarantee the same semantics as round
, ie away from 0.0.