Commit 9bcf7dc
feat(qc): recombination: F: Label Switching
## Recombination Detection: Strategy F - Label Switching
### Scientific Motivation
Recombination occurs when a virus incorporates genetic material from two or
more parental lineages. Each lineage accumulates characteristic mutations
over time - these "signature mutations" serve as molecular markers that
distinguish lineages from one another.
When a recombination event occurs, different genomic regions inherit
mutations from different parental lineages. This creates a distinctive
pattern: the sequence carries mutations characteristic of lineage A in one
region, and mutations characteristic of lineage B in another region.
The label switching strategy exploits this by leveraging the mutation label
map (nucMutLabelMap) - a curated mapping of nucleotide positions to lineage
labels. When private mutations are detected, they inherit labels from this
map. In a non-recombinant sequence, most labeled mutations should belong to
a single lineage (or closely related lineages). In a recombinant, mutations
from different lineages cluster in different genomic regions, creating
detectable "label switches" as you traverse the genome.
### Mechanism
The algorithm proceeds as follows:
1. **Label grouping**: Collect all labeled private substitutions from
`PrivateNucMutations.labeled_substitutions`. Group them by their primary
label (first label in the labels array), storing genomic positions for
each label.
2. **Minimum labels check**: If fewer than `minLabels` distinct labels are
present, return zero score (insufficient signal for recombination).
3. **Centroid calculation**: For each label, compute the centroid (mean
position) of all mutations carrying that label. This represents the
"center of mass" of each lineage's contribution.
4. **Switch counting**: Sort labels by their centroid position. The number
of switches equals `numLabels - 1`, representing transitions between
lineage-dominated regions as you traverse the genome from 5' to 3'.
5. **Scoring**: `score = numSwitches * weight`
### Configuration
Required in `pathogen.json`:
```json
{
"mutLabels": {
"nucMutLabelMap": {
"A123T": ["Alpha"],
"G456C": ["Beta"],
...
}
},
"qc": {
"recombinants": {
"enabled": true,
"scoreWeight": 100.0,
"labelSwitching": {
"enabled": true,
"weight": 50.0,
"minLabels": 2
}
}
}
}
```
Parameters:
- `enabled`: Activate label switching detection
- `weight`: Score contribution per label switch (default: 50.0)
- `minLabels`: Minimum distinct labels required to trigger detection
(default: 2)
### Advantages
- Leverages existing lineage annotation infrastructure (mutLabels)
- Biologically interpretable - directly identifies which lineages
contributed to the recombinant
- Does not require spatial parameters or segment definitions
- Robust to mutation density variations across the genome
- Works with any pathogen that has curated lineage-defining mutations
### Limitations
- Requires a well-curated `nucMutLabelMap` with lineage-specific mutations
- Effectiveness depends on quality and completeness of label annotations
- Cannot detect recombination between unlabeled or identically-labeled
lineages
- Uses only the first label when mutations have multiple labels
- Centroid-based ordering may miss complex recombination patterns with
interleaved regions
### Comparison to Other Strategies
Unlike Strategy A (weighted threshold) which only counts mutations, label
switching considers the identity and spatial distribution of labeled
mutations. Unlike Strategy B (spatial uniformity) which measures general
non-uniformity, this strategy specifically identifies which lineages
contribute to different regions.
Choose label switching when:
- Your pathogen has well-characterized lineage-defining mutations
- You want to identify the parental lineages, not just detect recombination
- The labeled mutation set has good genome-wide coverage
Choose other strategies when:
- No mutation label map is available (A, B, C, D)
- Recombination involves unlabeled variants (A, B, C, D)
- Multiple ancestral references are available (E)
### Implementation Summary
Files modified:
- `packages/nextclade/src/qc/qc_config.rs` - Added QcRecombConfigLabelSwitching config struct
- `packages/nextclade/src/qc/qc_rule_recombinants.rs` - Implemented strategy_label_switching function
- `packages/nextclade/src/qc/qc_recomb_utils.rs` - Added shared utilities module
- `packages/nextclade/src/qc/qc_run.rs` - Integrated recombinants rule
- `packages/nextclade/src/qc/mod.rs` - Registered new modules
- `packages/nextclade-web/src/helpers/formatQCRecombinants.ts` - Added UI formatting
- `packages/nextclade-web/src/components/Results/ListOfQcIsuues.tsx` - Display integration
- `packages/nextclade-schemas/*.schema.{json,yaml}` - Updated JSON schemas
Test dataset:
- `data/recomb/enpen/enterovirus/ev-d68/` - EV-D68 dataset with label
switching configuration enabled for testing
Unit tests added for:
- Disabled config returns None
- Empty labeled mutations returns None
- Single label below minLabels returns zero score
- Two labels returns one switch
- Three labels returns two switches
- Multiple labels per mutation uses first label only
### Future Work
- Support weighted label switches based on centroid separation distance
- Consider secondary labels for mutations with multiple lineage assignments
- Add visualization of label distribution across genome
- Integrate with tree-based lineage assignment for validation
Co-Authored-By: Claude <noreply@anthropic.com>1 parent 901e78d commit 9bcf7dc
File tree
26 files changed
+24189
-7
lines changed- data/recomb/enpen/enterovirus/ev-d68
- packages
- nextclade-schemas
- nextclade-web/src
- components
- DevTools
- Results
- helpers
- nextclade/src/qc
26 files changed
+24189
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
Lines changed: 18 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
0 commit comments