Simple mapping of original read IDs (Headers) to final DADA2 ASVs for validation purposes

I am working on validating the DADA2 pipeline's accuracy using simulated microbial community data. With simulated data, the "ground truth" identity of every single read is encoded directly in its original FASTQ file header (the sequence ID). My primary requirement is to obtain a simple, clean mapping that links the original input read to the final Amplicon Sequence Variant (ASV) identity determined by DADA2. Specifically, I need a reliable way to get this relationship for every single read that entered the pipeline (after trimming and filtering). This mapping is essential for calculating accurate performance metrics, such as the Adjusted Rand Index (ARI), where the output clustering (the ASV) must be compared against the known true identity (the Header) on a read-by-read basis. Crucially, this validation focuses solely on the clustering accuracy (sequence grouping) and does not rely on downstream taxonomic assignment.

Is there an existing, documented method or an internal DADA2 utility function that can provide this direct mapping table after the denoising, merging, and chimera-removal steps are complete?

Thanks. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simple mapping of original read IDs (Headers) to final DADA2 ASVs for validation purposes #2168

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Simple mapping of original read IDs (Headers) to final DADA2 ASVs for validation purposes #2168

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions