BC1 and BC4 rounding behavior doesn't match DirectXTex

Hi! I'm currently implementing a new DDS decoder for the `image` crate and wanted to use `bcdec_rs` for decoding BC1-7 images. Unfortunately, `bcdec_rs` does not decode BC1-5 images correctly. Due to rounding errors, colors can be off by up to one for both BC1 and BC4. 

### Example

While the error is only a single bit, it is noticeable. In BC1, the error affects G differently than R and B. This can lead to situations where R is one more than it should be and R and B are one less than they should be, resulting in a noticeable green tint. Here's an example BC1 image that should have the color RGB=47 47 47 everywhere (this is also what image programs like Paint.net and Gimp will show). Using `bcdec_rs` to decode this image will yield the color RGB=46 48 46 instead.

[BC1_UNORM_SRGB-47.zip](https://github.com/user-attachments/files/15775475/BC1_UNORM_SRGB-47.zip)

I also want to point out that these off-by-one errors aren't rare. I magnified the error of the following image, so we can see it:

Image: 
![image](https://github.com/ScanMountGoat/image_dds/assets/20878432/e79beb82-ea3d-4a15-9438-27bda3d8e1b4)

Error: 
![image](https://github.com/ScanMountGoat/image_dds/assets/20878432/e26bd004-404d-4e6a-93a3-d8936fc43be4)


### BC1

Before I explain the cause of the issue, I quickly want to point out that this bug is not with the source translation of bcdec. The original bcdec C library has the same bug. bcdec_rs simply faithfully reproduces the bug.

The bug itself is actually quite simple: when calculating `color2` and `color3`, bcdec is interpolating the [already rounded 8-bit values](https://github.com/ScanMountGoat/image_dds/blob/46e14df7303a0e7822a24338750d951d70a2b567/bcdec_rs/src/lib.rs#L1175) of `color0` and `color1` instead of the original 5/6-bit values. 

Using the 8-bit values for color0/1 is incorrect, because the conversion from 5/6 bits to 8 bits adds a bit of rounding error (e.g. 30 (5 bit) converted to 8 bit is 246.774 exactly but rounded 247) that is then passed along to color2/3 which are rounded again causing even more rounding error.

The fix is to do the interpolation with the original values and then convert to 8 bit. In code, this can be done like this:

```rs
let c0 = u16::from_le_bytes(compressed_block[0..2].try_into().unwrap());
let c1 = u16::from_le_bytes(compressed_block[2..4].try_into().unwrap());

// separate 565 colors
let r0_5 = (c0 >> 11) & 0x1F;
let g0_6 = (c0 >> 5) & 0x3F;
let b0_5 = c0 & 0x1F;

let r1_5 = (c1 >> 11) & 0x1F;
let g1_6 = (c1 >> 5) & 0x3F;
let b1_5 = c1 & 0x1F;

// Expand 565 ref colors to 888
let r0 = (r0_5 * 527 + 23) >> 6;
let g0 = (g0_6 * 259 + 33) >> 6;
let b0 = (b0_5 * 527 + 23) >> 6;
ref_colors[0] = [r0 as u8, g0 as u8, b0 as u8, 255u8];

let r1 = (r1_5 * 527 + 23) >> 6;
let g1 = (g1_6 * 259 + 33) >> 6;
let b1 = (b1_5 * 527 + 23) >> 6;
ref_colors[1] = [r1 as u8, g1 as u8, b1 as u8, 255u8];

if c0 > c1 || only_opaque_mode {
    // Standard BC1 mode (also BC3 color block uses ONLY this mode)
    // color_2 = 2/3*color_0 + 1/3*color_1
    // color_3 = 1/3*color_0 + 2/3*color_1
    let r = 2 * r0_5 + r1_5;
    let g = 2 * g0_6 + g1_6;
    let b = 2 * b0_5 + b1_5;
    let r = (r * 351 + 61) >> 7;
    let g = (g as u32 * 2763 + 1039) >> 11;
    let b = (b * 351 + 61) >> 7;
    ref_colors[2] = [r as u8, g as u8, b as u8, 255u8];

    let r = r0_5 + 2 * r1_5;
    let g = g0_6 + 2 * g1_6;
    let b = b0_5 + 2 * b1_5;
    let r = (r * 351 + 61) >> 7;
    let g = (g as u32 * 2763 + 1039) >> 11;
    let b = (b * 351 + 61) >> 7;
    ref_colors[3] = [r as u8, g as u8, b as u8, 255u8];
} else {
    // Quite rare BC1A mode
    // color_2 = 1/2*color_0 + 1/2*color_1;
    // color_3 = 0;
    let r = r0_5 + r1_5;
    let g = g0_6 + g1_6;
    let b = b0_5 + b1_5;
    let r = (r * 1053 + 125) >> 8;
    let g = (g as u32 * 4145 + 1019) >> 11;
    let b = (b * 1053 + 125) >> 8;
    ref_colors[2] = [r as u8, g as u8, b as u8, 255u8];

    ref_colors[3] = [0u8; 4];
}
```

In case you're curious about the crazy conversions like `(r * 351 + 61) >> 7`: they use the same trick as bcdec for its 5/6-bit number to 8-bit number conversion. E.g. `(r * 351 + 61) >> 7` is equivalent to `(r as f64 / (3 * 63) * 255).round()` for values `0 <= r <= 3*63`. I got all of these constants using a brute force script that verified that it correctly maps the entire range of inputs.

### BC4

Here the bug is simpler: the rounding is wrong. E.g.

```rs
alpha[2] = (6 * alpha[0] + alpha[1] + 1) / 7;
```

That `+ 1` should have been `+ 3`.

Similarly:

```rs
alpha[2] = (4 * alpha[0] + alpha[1] + 1) / 5;
```

That `+ 1` should have been `+ 2`.

Why is `+1` wrong for the `/7` values? E.g. if the interpolated alpha value turns out to be 5, then 5/7 should be rounded to 1. But `(5+1) / 7 == 0` because integer division is floor division.

In general, if you have 2 unsigned integers `x` and `n` and want to find the *rounded* division of `x/n`, then it can be calculated as `(x + (n>>1)) / n`. This is why the small number we have to add to the interpolated value is 3 for `/7` and `2` for `/5`.

---

Would you like me to make a PR?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BC1 and BC4 rounding behavior doesn't match DirectXTex #17

Example

BC1

BC4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

BC1 and BC4 rounding behavior doesn't match DirectXTex #17

Description

Example

BC1

BC4

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions