Handling ITS reads with variable lengths: Should ASVs be normalized to avoid redundancy?

Hello all,

I am working with ITS amplicon data, which naturally contains fragments of variable lengths. I noticed that in my dataset, some ASVs have identical sequences to others but are shorter versions of the same sequence. This raised a question about how DADA2 handles ITS length variability.

Given this, I would like to ask:

Should ITS sequences be length-standardized (trimmed or padded) to avoid generating redundant ASVs that are biologically the same but differ only in length?
Or does DADA2 internally handle such cases so that shorter fragments do not artificially inflate ASV diversity?

In our case, the samples consist almost exclusively of _Saccharomyces cerevisiae_ yeast, with extremely low diversity. Our goal is to detect very small differences (ideally down to the strain level). We are concerned that the presence of ITS fragments of different lengths might affect our ability to resolve fine-scale variation, especially given the biology of our samples.

Any guidance on best practices for ITS processing with DADA2 in such low-diversity, strain-focused datasets would be greatly appreciated.

Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling ITS reads with variable lengths: Should ASVs be normalized to avoid redundancy? #2166

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Handling ITS reads with variable lengths: Should ASVs be normalized to avoid redundancy? #2166

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions