Skip to content

fix: prevent integer overflow in band area calculation#1752

Merged
ivan-aksamentov merged 6 commits intomasterfrom
fix/alignment-overflow-panic
Feb 27, 2026
Merged

fix: prevent integer overflow in band area calculation#1752
ivan-aksamentov merged 6 commits intomasterfrom
fix/alignment-overflow-panic

Conversation

@ivan-aksamentov
Copy link
Member

Band area calculations could overflow on 32-bit WASM when aligning very long sequences against highly divergent references. This PR uses u64 for intermediate calculations and provides actionable error messages with human-readable numbers when the band area limit is exceeded.

  • Use u64 for band area to prevent overflow on 32-bit WASM
  • Add overflow checks to Band2d dimension calculation
  • Improve error message with human-readable numbers
  • Extract format_number_human utility for reuse

Band area calculation in `regularize_stripes()` accumulates stripe widths across all reference positions (~R × W). For pathological inputs like concatenated genomes (179kb query vs 30kb reference), this yields ~3.6 billion, exceeding `u32::MAX` and causing overflow on 32-bit WASM. Debug builds panic with "attempt to add with overflow" before the `align.rs:80` guard can reject it with a proper error message.

- Use `saturating_add()` to cap at `usize::MAX` (~4.3B), which still exceeds `max_band_area` (500M) so the downstream check correctly rejects with informative message
- Reproducer: GISAID sequence EPI_ISL_20374993 (6 concatenated SARS-CoV-2 genomes)

Fixes: #1749
Defense-in-depth fix for `calculate_dimensions()` which has the same accumulation pattern as `seed_alignment.rs` (fixed in 9d95d0a). In normal operation this is protected by the band_area check in `align.rs:80`, but could be reached through future code changes, direct `Band2d::new()` calls, or edge cases with many narrow stripes.

- Use `saturating_add()` for consistency; overflow causes OOM on allocation rather than silent memory corruption via wrapped indices

Ref: #1749
Band area calculation accumulates stripe widths across all reference positions (R × W). For pathological inputs like concatenated genomes (e.g., 179kb query vs 30kb reference), this yields ~3.7 billion, which exceeds `u32::MAX` (~4.3B) and causes overflow on 32-bit WASM.

Previous fix (9d95d0a) used `saturating_add()` to prevent panic, but saturated values produce misleading error messages. Using `u64` ensures accurate calculation on both 32-bit (WASM) and 64-bit platforms without saturation artifacts.

- Change `max_band_area` parameter from `usize` to `u64`
- Change `create_alignment_band()` return type from `(Vec<Stripe>, usize)` to `(Vec<Stripe>, u64)`
- Accumulate band area as `u64` in `regularize_stripes()`
- Update JSON schemas to reflect `uint64` format

Ref: #1749
The previous error message ("Alignment matrix size X exceeds maximum value Y") was uninformative for non-technical users who don't understand banded Smith-Waterman alignment.

The new message:

- Formats large numbers for readability (3.7B instead of 3704350009, 179,151 instead of 179151)
- States observable facts: query and reference sequence lengths
- Differentiates between two failure modes based on length ratio:
  - Query >1.5× reference: likely concatenated sequences or assembly scaffolds
  - Similar lengths: likely structural rearrangements or wrong reference
- Lists possible causes as hypotheses, not assertions
- Preserves technical details for advanced users

Example output:
```
Alignment band area (3.7B) exceeds limit (500M). Query sequence length (179,151 nt) is significantly larger than reference (29,903 nt). Possible reasons: concatenated sequences, assembly scaffolds, or wrong reference sequence.
```

Ref: #1749
Add configurable HumanFormat with builder pattern supporting:
- Grouping styles (Standard, Indian, None)
- Custom separator character
- Compact notation (K/M/B/T suffixes)
- Configurable threshold and decimal places
@github-actions
Copy link

Base automatically changed from fix/web-results-table-row-index to master February 27, 2026 11:51
@ivan-aksamentov ivan-aksamentov merged commit 27c513c into master Feb 27, 2026
19 checks passed
@ivan-aksamentov ivan-aksamentov deleted the fix/alignment-overflow-panic branch February 27, 2026 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant