Hi Canu team,
I’m trying to assign individual long reads to each parental (pseudo-)haplotype without parental data.
Standard approaches based on small variants (e.g., whatshap haplotag) don’t seem to work well for complex or highly unbalanced structural variation. Intuitively, a read-binning strategy using haplotype-specific unique k-mers should perform better across all variant types (SNVs + SVs).
I’m wondering whether splitHaplotype is appropriate for this use case, and if so, whether you have recommendations for parameter tuning or database construction.
So far I've tried running a couple versions of the following code. hap1.fa and hap2.fa are pseudohaplotypes generated by hifiasm for a diploid species.
At this stage, I’m not very concerned about haplotype switch errors in the assemblies.
Reads are PacBio HiFi.
meryl count k=63 hap1.fa output hap1.k63.meryl
meryl count k=63 hap2.fa output hap2.k63.meryl
meryl difference hap1.k63.meryl hap2.k63.meryl output hap1.k63.only.meryl
meryl difference hap2.k63.meryl hap1.k63.meryl output hap2.k63.only.meryl
splitHaplotype -R hifi.fq.gz \
-H hap1.k63.only.meryl 1 canu.k63.only.hap1.fq.gz \
-H hap2.k63.only.meryl 1 canu.k63.only.hap2.fq.gz \
-A canu.k63.only.unk.fq.gz
The output files (hap1 and hap2) both contain reads, but they appear to be approximately 50:50 mixtures of both haplotypes.
Any guidance would be greatly appreciated. Thank you!
Hi Canu team,
I’m trying to assign individual long reads to each parental (pseudo-)haplotype without parental data.
Standard approaches based on small variants (e.g.,
whatshap haplotag) don’t seem to work well for complex or highly unbalanced structural variation. Intuitively, a read-binning strategy using haplotype-specific unique k-mers should perform better across all variant types (SNVs + SVs).I’m wondering whether
splitHaplotypeis appropriate for this use case, and if so, whether you have recommendations for parameter tuning or database construction.So far I've tried running a couple versions of the following code.
hap1.faandhap2.faare pseudohaplotypes generated by hifiasm for a diploid species.At this stage, I’m not very concerned about haplotype switch errors in the assemblies.
Reads are PacBio HiFi.
The output files (
hap1andhap2) both contain reads, but they appear to be approximately 50:50 mixtures of both haplotypes.Any guidance would be greatly appreciated. Thank you!