Applying canu's trio binning function using assembly's pseudohaplotypes

Hi Canu team,

I’m trying to assign individual long reads to each parental (pseudo-)haplotype without parental data.

Standard approaches based on small variants (e.g., `whatshap haplotag`) don’t seem to work well for complex or highly unbalanced structural variation. Intuitively, a read-binning strategy using haplotype-specific unique k-mers should perform better across all variant types (SNVs + SVs).

I’m wondering whether `splitHaplotype` is appropriate for this use case, and if so, whether you have recommendations for parameter tuning or database construction.

So far I've tried running a couple versions of the following code. `hap1.fa` and `hap2.fa` are pseudohaplotypes generated by hifiasm for a diploid species.

At this stage, I’m not very concerned about haplotype switch errors in the assemblies.

Reads are PacBio HiFi.

```
meryl count k=63 hap1.fa output hap1.k63.meryl
meryl count k=63 hap2.fa output hap2.k63.meryl

meryl difference hap1.k63.meryl hap2.k63.meryl output hap1.k63.only.meryl
meryl difference hap2.k63.meryl hap1.k63.meryl output hap2.k63.only.meryl

splitHaplotype -R hifi.fq.gz \
-H hap1.k63.only.meryl 1 canu.k63.only.hap1.fq.gz \
-H hap2.k63.only.meryl 1 canu.k63.only.hap2.fq.gz \
-A canu.k63.only.unk.fq.gz
```
The output files (`hap1` and `hap2`) both contain reads, but they appear to be approximately 50:50 mixtures of both haplotypes.

Any guidance would be greatly appreciated. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying canu's trio binning function using assembly's pseudohaplotypes #2387

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Applying canu's trio binning function using assembly's pseudohaplotypes #2387

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions