sense and antisense reads #2646
Replies: 1 comment
-
|
Just to site chatGPT (after she repeatedly and convincingly informed me it was "A"): I said: ChatGPT said: The core question: The answer is (B): How to unpack this: For each read (or properly paired set of reads), STAR looks at whether this RNA fragment aligns to the same strand as the annotated gene (→ sense) or the opposite strand (→ antisense). Columns 3 and 4 then report counts of reads that are sense vs. antisense with respect to the gene annotation, not with respect to read1 vs. read2. Illumina read 1 vs. read 2 orientation is irrelevant to STAR's sense/antisense count: The physical fact that read1 and read2 come from opposite strands of the same DNA fragment is just part of sequencing technology. What STAR counts is: “Does this fragment come from an RNA that matches the gene’s annotated strand (sense) or the opposite (antisense)?” This is why featureCounts gives similar sense vs. antisense results: featureCounts, when run in stranded mode, also classifies reads based on the transcribed strand vs. gene annotation. So both STAR and featureCounts are essentially giving you counts of transcripts, not counts of individual read orientations. Why is the STAR manual wording confusing? But since STAR aligns reads to the genome and looks at gene annotation, the only way to define "antisense" is relative to the biological feature. Technical check: The actual RNA strand is the reverse complement of the gene’s annotated strand. So to see sense transcription, STAR will report it in one column (say column 4), and antisense (opposite-strand transcription) in the other (column 3). Which column is sense or antisense depends on the strandedness, but the principle stays the same: it is about the direction of the original RNA. Conclusion: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear Alex and All,
I am sorry for bringing this issue once again, but there is still a lot of confusion among users (and AIs I have consulted). Here is a sentence from STAR manual:
"Note, that if you have stranded data and choose one of the columns 3 or 4, the other column (4 or 3) will give you the
count of antisense reads".
What exactly does it mean? Does it mean:
(A) "the other column gives you the count of antisense reads, members of the same pair of Illumina paired reads, read from the same transcript" (for example in case of a stranded reversed library column 4 is the number of read2's and column 3 is the number of read1's).
or
(B) "the other column gives you the number of antisense reads (antisense pairs of reads, in case of pair-ends libs) read from a transcript transcribed from the same feature area but in the antisense direction (i.e., transcribed from the coding, not template strand relative to the annotated feature).
I have initially assumed it was (B), but the way this sentence is worded ("the count of antisense reads" not "the count of antisense transcripts"), it undoubtedly means (A) and several living and AI experts convinced me it is indeed (A).
However, if I use featureCounts instead of just analyzing STAR's ReadsPerGene.out.tab, I get very similar counts of sense and antisense transcripts.
Could someone please tell me - is it A or B??
Thank you so much in advance!
L.Yampolsky
Beta Was this translation helpful? Give feedback.
All reactions