-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi,
Thank you for developing this tool. Really enjoyed your preprint.
Query
I was wondering if there is a way to export the query k-mer positions for each reference position in the output vcf files? Or is there some way I can expose this as a temp file? The contig and position of the reference are present in the VCF files as expected, but not the query positional info.
Use case
I've been using kbo on assemblies in a pairwise manner, where each genome is used as a reference and a query for every other genome.
Example:
---Genome_dir
┕-----Genome_1.fasta
┕-----Genome_2.fasta
┕-----Genome_3.fasta
However, when I compare the same genomes as queries and references, I get different results:
| Ref | Query | SNPs | Indels |
|---|---|---|---|
| Genome_1 | Genome_2 | 79 | 10 |
| Genome_2 | Genome_1 | 86 | 11 |
I understand why this is occurring, but I'd really like to be able to see which reference positions these refer to, so I can remove common SNPs between both comparisons, and get the true number of coalesced pairwise SNPs (Genome_1 SNPs ∩ Genome_2 SNPs + Genome_1 SNPs + Genome_2 SNPs . This would be more accurate than taking the mean of them both I'd guess.
In these cases, the majority of these SNPs will overlap, but I cannot determine which ones do without the query positional information. Since I am using assemblies and not raw reads, this should be possible right?
Thanks