fix: prevent integer overflow in band area calculation#1752
Merged
ivan-aksamentov merged 6 commits intomasterfrom Feb 27, 2026
Merged
fix: prevent integer overflow in band area calculation#1752ivan-aksamentov merged 6 commits intomasterfrom
ivan-aksamentov merged 6 commits intomasterfrom
Conversation
Band area calculation in `regularize_stripes()` accumulates stripe widths across all reference positions (~R × W). For pathological inputs like concatenated genomes (179kb query vs 30kb reference), this yields ~3.6 billion, exceeding `u32::MAX` and causing overflow on 32-bit WASM. Debug builds panic with "attempt to add with overflow" before the `align.rs:80` guard can reject it with a proper error message. - Use `saturating_add()` to cap at `usize::MAX` (~4.3B), which still exceeds `max_band_area` (500M) so the downstream check correctly rejects with informative message - Reproducer: GISAID sequence EPI_ISL_20374993 (6 concatenated SARS-CoV-2 genomes) Fixes: #1749
Defense-in-depth fix for `calculate_dimensions()` which has the same accumulation pattern as `seed_alignment.rs` (fixed in 9d95d0a). In normal operation this is protected by the band_area check in `align.rs:80`, but could be reached through future code changes, direct `Band2d::new()` calls, or edge cases with many narrow stripes. - Use `saturating_add()` for consistency; overflow causes OOM on allocation rather than silent memory corruption via wrapped indices Ref: #1749
Band area calculation accumulates stripe widths across all reference positions (R × W). For pathological inputs like concatenated genomes (e.g., 179kb query vs 30kb reference), this yields ~3.7 billion, which exceeds `u32::MAX` (~4.3B) and causes overflow on 32-bit WASM. Previous fix (9d95d0a) used `saturating_add()` to prevent panic, but saturated values produce misleading error messages. Using `u64` ensures accurate calculation on both 32-bit (WASM) and 64-bit platforms without saturation artifacts. - Change `max_band_area` parameter from `usize` to `u64` - Change `create_alignment_band()` return type from `(Vec<Stripe>, usize)` to `(Vec<Stripe>, u64)` - Accumulate band area as `u64` in `regularize_stripes()` - Update JSON schemas to reflect `uint64` format Ref: #1749
The previous error message ("Alignment matrix size X exceeds maximum value Y") was uninformative for non-technical users who don't understand banded Smith-Waterman alignment.
The new message:
- Formats large numbers for readability (3.7B instead of 3704350009, 179,151 instead of 179151)
- States observable facts: query and reference sequence lengths
- Differentiates between two failure modes based on length ratio:
- Query >1.5× reference: likely concatenated sequences or assembly scaffolds
- Similar lengths: likely structural rearrangements or wrong reference
- Lists possible causes as hypotheses, not assertions
- Preserves technical details for advanced users
Example output:
```
Alignment band area (3.7B) exceeds limit (500M). Query sequence length (179,151 nt) is significantly larger than reference (29,903 nt). Possible reasons: concatenated sequences, assembly scaffolds, or wrong reference sequence.
```
Ref: #1749
Add configurable HumanFormat with builder pattern supporting: - Grouping styles (Standard, Indian, None) - Custom separator character - Compact notation (K/M/B/T suffixes) - Configurable threshold and decimal places
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Band area calculations could overflow on 32-bit WASM when aligning very long sequences against highly divergent references. This PR uses u64 for intermediate calculations and provides actionable error messages with human-readable numbers when the band area limit is exceeded.
format_number_humanutility for reuse