-
Notifications
You must be signed in to change notification settings - Fork 1k
Implement full-range i256::to_f64
to eliminate ±∞ saturation for Decimal256 → Float64 casts
#7986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@scovich |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
However -- we may want to consider more testing at 54-bit precision boundary cases, to ensure that converting high and low parts to float independently and then combining them does not introduce any weird rounding effects.
fn to_f64(&self) -> Option<f64> { | ||
let mag = if let Some(u) = self.checked_abs() { | ||
let (low, high) = u.to_parts(); | ||
(high as f64) * 2_f64.powi(128) + (low as f64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking this through:
- If
high
is zero (no significant bits), then conversion to f64 is exact (tho useless)- ... and we return the value of
low
, converted to f64
- ... and we return the value of
- If
high
has1..=53
significant bits, then conversion to f64 is exact (no rounding)- ... and scaling is also exact
- ... and adding
low
(already converted to f64) will round as needed
- If
high
has54..
significant bits, then conversion to f64 will use the 54th bit to round- ... tho scaling is still exact
- ... and it doesn't matter what value
low
takes, because it's so small that adding it doesn't change the answer
A bit expensive, but I think it covers all the cases with no weird rounding effects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we could do better by manually twiddling bits, but that's probably a good follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit twiddling that I think would work:
- Define
i256::leading_zeros()
that follows the semantics of all the otherleading_zeros
for integral typesimpl i256 { pub fn leading_zeros(&self) -> u32 { match self.high { 0 => 128 + self.low.leading_zeros(), _ => self.high.leading_zeros(), } } }
- Define a notion of "redundant leading sign bits" in terms of leading zeros:
fn redundant_leading_sign_bits_i256(n: i256) -> u32 { let mask = n >> 255; // all ones or all zeros (n ^ mask).leading_zeros() - 1; // we only need one sign bit }
- Shift out all redundant leading sign bits when converting to f64:
fn i256_to_f64(n: i256) -> f64 { let k = redundant_leading_sign_bits_i256(n); let n = n << k; // left-justify (no redundant sign bits) let n = (n.high >> 64) as i64; // throw away the lower 192 bits (n as f64) * f64::powi(2.0, 192-k) // convert to f64 and scale it }
The above should work for both positive and negative values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking in
arrow-cast/src/cast/mod.rs
Outdated
/// let result = decimal256_to_f64(val); | ||
/// assert_eq!(result, 123456789.0); | ||
/// ``` | ||
pub fn decimal256_to_f64(val: i256) -> f64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: This function is badly-named. It doesn't convert the decimal to f64. Rather, it converts the decimal's unscaled value to f64, and the caller must then apply the appropriate scalaing as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I believe this function is newly-added after the most recent arrow release. So it's not yet in the wild. We should just remove it, and revert the call site back to calling ToPrimitive::to_f64
directly, like it originally did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @scovich
Implemented
Co-authored-by: Ryan Johnson <[email protected]>
…ove tests to bigint - Removed the standalone `decimal256_to_f64` function in favor of directly using the `to_f64()` method on `i256`. - Updated cast implementation to call `x.to_f64().expect("All i256 values fit in f64")` inline. - Added tests for `i256::to_f64()` conversion covering typical values, large positive, and large negative values. - Ensured all `i256` to `f64` conversions are handled with expectation on fit, improving code clarity and removing redundant wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which issue does this PR close?
Closes #7985
Rationale for this change
The existing Decimal256 → Float64 conversion was changed to saturate out-of-range values to
±INFINITY
(PR #7887) in order to avoid panics. However, every 256-bit signed integer actually fits within the exponent range of an IEEE-754f64
(±2¹⁰²³), so we can always produce a finitef64
, only sacrificing mantissa precision.By overriding
i256::to_f64
to split the full 256-bit magnitude into high/low 128-bit halves, recombine asand reapply the sign (special-casing i256::MIN), we:
Eliminate both panics and infinite results
Match Rust’s built-in (i128) as f64 rounding (ties-to-even)
Simplify casting logic—no saturating helpers or extra flags required
What changes are included in this PR?
Added full-range fn to_f64(&self) -> Option for i256, using checked_abs() + to_parts() + recombination
Removed fallback through 64-bit to_i64()/to_u64() and .unwrap()
Replaced the old decimal256_to_f64 saturating helper with a thin wrapper around the new i256::to_f64() (always returns Some)
Updated Decimal256 → Float64 cast sites to call the new helper
Tests
Reworked “overflow” tests to assert finite & correctly signed results for i256::MAX and i256::MIN
Added typical-value tests; removed expectations of ∞/-∞
Are there any user-facing changes?
Behavior change:
Very large or small Decimal256 values no longer become +∞/-∞.
They now map to very large—but finite—f64 values (rounded to nearest mantissa).
API impact:
No public API signatures changed.
Conversion remains lossy by design; users relying on saturation-to-infinity will observe different (more faithful) behavior.