Hello @arangrhie! First of all, thank you for all your work and support!
I'm implementing meryl/merqury in a pipeline to evaluate a bunch of public genomic assemblies of neglected parasites. While some of them used newer sequencing technologies, a good portion relied on genomic reads with rather low quality, specially early Illumina and PacBio reads, and even some now discontinued technologies such as 454 and IonTorrent. As many scientists of our community still use these assemblies, it is a necessity to evaluate them as well. I know that you do not recommend running meryl/merqury for these low quality read libraries. However, I noticed that the tolerable collision rate parameter in the best k script is set to the same error rate generally expected from a Illumina sequencing, that is 0.001, equivalent to a Q30. So, I was wondering that, if this association is true, I could increase the tolerable collision rate for read libraries with lower quality and continue to run meryl/merqury without big issues. Does this make any sense or am I getting the meaning of the collision rate completely wrong?
Thank you in advance!
Hello @arangrhie! First of all, thank you for all your work and support!
I'm implementing meryl/merqury in a pipeline to evaluate a bunch of public genomic assemblies of neglected parasites. While some of them used newer sequencing technologies, a good portion relied on genomic reads with rather low quality, specially early Illumina and PacBio reads, and even some now discontinued technologies such as 454 and IonTorrent. As many scientists of our community still use these assemblies, it is a necessity to evaluate them as well. I know that you do not recommend running meryl/merqury for these low quality read libraries. However, I noticed that the tolerable collision rate parameter in the best k script is set to the same error rate generally expected from a Illumina sequencing, that is 0.001, equivalent to a Q30. So, I was wondering that, if this association is true, I could increase the tolerable collision rate for read libraries with lower quality and continue to run meryl/merqury without big issues. Does this make any sense or am I getting the meaning of the collision rate completely wrong?
Thank you in advance!