Skip to content

Conversation

kosiew
Copy link
Contributor

@kosiew kosiew commented Jul 24, 2025

Which issue does this PR close?

Closes #7985


Rationale for this change

The existing Decimal256 → Float64 conversion was changed to saturate out-of-range values to ±INFINITY (PR #7887) in order to avoid panics. However, every 256-bit signed integer actually fits within the exponent range of an IEEE-754 f64 (±2¹⁰²³), so we can always produce a finite f64, only sacrificing mantissa precision.
By overriding i256::to_f64 to split the full 256-bit magnitude into high/low 128-bit halves, recombine as

(high as f64) * 2^128 + (low as f64)

and reapply the sign (special-casing i256::MIN), we:

  • Eliminate both panics and infinite results

  • Match Rust’s built-in (i128) as f64 rounding (ties-to-even)

  • Simplify casting logic—no saturating helpers or extra flags required

What changes are included in this PR?

  • Added full-range fn to_f64(&self) -> Option for i256, using checked_abs() + to_parts() + recombination

  • Removed fallback through 64-bit to_i64()/to_u64() and .unwrap()

  • Replaced the old decimal256_to_f64 saturating helper with a thin wrapper around the new i256::to_f64() (always returns Some)

  • Updated Decimal256 → Float64 cast sites to call the new helper

Tests

  • Reworked “overflow” tests to assert finite & correctly signed results for i256::MAX and i256::MIN

  • Added typical-value tests; removed expectations of ∞/-∞

Are there any user-facing changes?

Behavior change:

  • Very large or small Decimal256 values no longer become +∞/-∞.

  • They now map to very large—but finite—f64 values (rounded to nearest mantissa).

API impact:

No public API signatures changed.

Conversion remains lossy by design; users relying on saturation-to-infinity will observe different (more faithful) behavior.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 24, 2025
@kosiew
Copy link
Contributor Author

kosiew commented Jul 24, 2025

@scovich
Can you review this?

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

However -- we may want to consider more testing at 54-bit precision boundary cases, to ensure that converting high and low parts to float independently and then combining them does not introduce any weird rounding effects.

fn to_f64(&self) -> Option<f64> {
let mag = if let Some(u) = self.checked_abs() {
let (low, high) = u.to_parts();
(high as f64) * 2_f64.powi(128) + (low as f64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this through:

  • If high is zero (no significant bits), then conversion to f64 is exact (tho useless)
    • ... and we return the value of low, converted to f64
  • If high has 1..=53 significant bits, then conversion to f64 is exact (no rounding)
    • ... and scaling is also exact
    • ... and adding low (already converted to f64) will round as needed
  • If high has 54.. significant bits, then conversion to f64 will use the 54th bit to round
    • ... tho scaling is still exact
    • ... and it doesn't matter what value low takes, because it's so small that adding it doesn't change the answer

A bit expensive, but I think it covers all the cases with no weird rounding effects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we could do better by manually twiddling bits, but that's probably a good follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit twiddling that I think would work:

  1. Define i256::leading_zeros() that follows the semantics of all the other leading_zeros for integral types
    impl i256 {
        pub fn leading_zeros(&self) -> u32 {
            match self.high {
                0 => 128 + self.low.leading_zeros(),
                _ => self.high.leading_zeros(),
            }
        }
    }
  2. Define a notion of "redundant leading sign bits" in terms of leading zeros:
    fn redundant_leading_sign_bits_i256(n: i256) -> u32 {
        let mask = n >> 255; // all ones or all zeros
        (n ^ mask).leading_zeros() - 1; // we only need one sign bit
    }
  3. Shift out all redundant leading sign bits when converting to f64:
    fn i256_to_f64(n: i256) -> f64 {
        let k = redundant_leading_sign_bits_i256(n);
        let n = n << k; // left-justify (no redundant sign bits)
        let n = (n.high >> 64) as i64; // throw away the lower 192 bits
        (n as f64) * f64::powi(2.0, 192-k) // convert to f64 and scale it
    }

The above should work for both positive and negative values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// let result = decimal256_to_f64(val);
/// assert_eq!(result, 123456789.0);
/// ```
pub fn decimal256_to_f64(val: i256) -> f64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: This function is badly-named. It doesn't convert the decimal to f64. Rather, it converts the decimal's unscaled value to f64, and the caller must then apply the appropriate scalaing as needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I believe this function is newly-added after the most recent arrow release. So it's not yet in the wild. We should just remove it, and revert the call site back to calling ToPrimitive::to_f64 directly, like it originally did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @scovich

Implemented

kosiew and others added 2 commits July 25, 2025 10:08
…ove tests to bigint

- Removed the standalone `decimal256_to_f64` function in favor of directly using the `to_f64()` method on `i256`.
- Updated cast implementation to call `x.to_f64().expect("All i256 values fit in f64")` inline.
- Added tests for `i256::to_f64()` conversion covering typical values, large positive, and large negative values.
- Ensured all `i256` to `f64` conversions are handled with expectation on fit, improving code clarity and removing redundant wrapper.
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kosiew and @scovich

@alamb alamb merged commit d634ac8 into apache:main Jul 29, 2025
26 checks passed
@kosiew
Copy link
Contributor Author

kosiew commented Jul 30, 2025

Thanks @scovich , @alamb for your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement full-range i256::to_f64 to replace current ±∞ saturation for Decimal256 → Float64
3 participants