You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: make --gtf and --bed mutually exclusive; GTF now runs all analyses
Allow users to provide either a GTF or BED annotation file (but not both).
GTF mode runs everything (dupRadar, featureCounts, and all 7 RSeQC tools)
by deriving transcript-level structure from the GTF. BED mode runs only
RSeQC tools, skipping dupRadar/featureCounts with a warning.
Key changes:
- Extend GTF parser with Transcript struct (exon blocks + CDS range)
- Add from_genes() constructors to all 5 annotation-requiring RSeQC tools
- Extract shared junction/intron helpers into rseqc::common module
- Pre-build all RSeQC data structures once (no more per-BAM BED re-parsing)
- Fix inner_distance first-transcript-per-chromosome RSeQC bug
- Update all documentation to reflect new CLI semantics
Copy file name to clipboardExpand all lines: README.md
+11-16Lines changed: 11 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ All tools accept SAM/BAM/CRAM input and support processing multiple files in a s
29
29
30
30
### `rustqc rna` -- RNA-Seq quality control
31
31
32
-
Performs dupRadar-equivalent duplicate rate analysis, featureCounts-compatible read counting, and 7 RSeQC-equivalent QC analyses in a single pass. Given a duplicate-marked BAM, GTF annotation, and optional BED12 gene model, it computes per-gene duplication rates, fits a logistic regression model, generates diagnostic plots, produces gene-level count files with biotype summaries, and runs comprehensive QC metrics (bam_stat, infer_experiment, read_duplication, read_distribution, junction_annotation, junction_saturation, inner_distance).
32
+
Performs dupRadar-equivalent duplicate rate analysis, featureCounts-compatible read counting, and 7 RSeQC-equivalent QC analyses in a single pass. Given a duplicate-marked BAM and either a GTF annotation or a BED12 gene model, it computes per-gene duplication rates, fits a logistic regression model, generates diagnostic plots, produces gene-level count files with biotype summaries, and runs comprehensive QC metrics (bam_stat, infer_experiment, read_duplication, read_distribution, junction_annotation, junction_saturation, inner_distance).
33
33
34
34
| Feature | dupRadar (R) | RustQC |
35
35
|---------|-------------|--------|
@@ -56,7 +56,7 @@ The `rustqc rna` command also includes reimplementations of 7 popular [RSeQC](ht
56
56
| junction_saturation |`junction_saturation.py`| Saturation analysis of detected splice junctions |
57
57
| inner_distance |`inner_distance.py`| Inner distance distribution for paired-end reads |
58
58
59
-
All RSeQC tools run by default. Five of the seven tools require a BED12 gene model file (`--bed`); if`--bed` is not provided, those tools are skipped with a warning. Individual tools can be disabled via the [configuration file](#configuration).
59
+
All RSeQC tools run by default when annotation is provided via `--gtf` or`--bed`. With a GTF file, all 7 tools run alongside dupRadar and featureCounts. With a BED file only, all 7 RSeQC tools run but dupRadar and featureCounts are skipped (they require a GTF). Individual tools can be disabled via the [configuration file](#configuration).
60
60
61
61
## Density scatter plots
62
62
@@ -140,15 +140,16 @@ The binary will be at `target/release/rustqc`.
|`<INPUT>...`| One or more duplicate-marked alignment files (SAM/BAM/CRAM). Duplicates must be flagged (SAM flag 0x400), not removed. BAM/CRAM files should be sorted and indexed for parallel processing. |
151
-
|`--gtf <GTF>`| Path to a GTF gene annotation file (e.g., from Ensembl or UCSC). |
151
+
|`--gtf <GTF>`| Path to a GTF gene annotation file. Runs all analyses (dupRadar + featureCounts + all 7 RSeQC tools). Mutually exclusive with `--bed`. |
152
+
|`--bed <BED>`| Path to a BED12 gene model file. Runs RSeQC tools only (dupRadar and featureCounts are skipped). Mutually exclusive with `--gtf`. |
The RSeQC tools are integrated into `rustqc rna` and run automatically. To enable the tools that require a BED12 gene model (infer_experiment, read_distribution, junction_annotation, junction_saturation, inner_distance), pass `--bed`:
188
-
189
-
```bash
190
-
# Run everything: dupRadar + featureCounts + all 7 RSeQC tools
The 7 RSeQC tools are integrated into `rustqc rna` and run automatically with either `--gtf` or `--bed`. With a GTF file, all analyses run (dupRadar + featureCounts + all 7 RSeQC tools). With a BED file, only the 7 RSeQC tools run.
Five tools (infer_experiment, read_distribution, junction_annotation, junction_saturation, inner_distance) require a BED12 gene model file via `--bed`. If `--bed` is omitted, these tools are skipped with a warning while the remaining tools still run. Individual tools can be disabled via the YAML configuration file.
58
+
When a GTF file is provided via `--gtf`, all 7 tools run automatically — transcript-level structure is extracted from the GTF. Alternatively, a BED12 gene model file can be provided via `--bed` (mutually exclusive with `--gtf`), which runs only the RSeQC tools. Individual tools can be disabled via the YAML configuration file.
Copy file name to clipboardExpand all lines: docs/src/content/docs/getting-started/quickstart.md
+14-16Lines changed: 14 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,8 +10,7 @@ This guide walks you through a basic RustQC analysis from start to finish.
10
10
The `rustqc rna` command runs all analyses in a single pass. It requires:
11
11
12
12
- A **duplicate-marked** alignment file (BAM, SAM, or CRAM). Duplicates must be flagged with SAM flag 0x400 by a tool like [Picard MarkDuplicates](https://broadinstitute.github.io/picard/), [samblaster](https://github.com/GregoryFaust/samblaster), or [sambamba](https://github.com/biod/sambamba).
13
-
- A **GTF annotation** file (for dupRadar, featureCounts, and gene-level analyses).
14
-
- Optionally, a **BED12 gene model** file (`--bed`) for the 5 RSeQC tools that require it (infer_experiment, read_distribution, junction_annotation, junction_saturation, inner_distance). If omitted, these tools are skipped with a warning.
13
+
- Either a **GTF annotation** file (`--gtf`) or a **BED12 gene model** file (`--bed`). With a GTF, all analyses run (dupRadar, featureCounts, and all 7 RSeQC tools). With a BED file, only the 7 RSeQC tools run. The two flags are mutually exclusive.
15
14
16
15
## RNA-seq duplicate analysis
17
16
@@ -50,13 +49,12 @@ integrated into the `rustqc rna` command. They run automatically alongside the
50
49
dupRadar and featureCounts analyses:
51
50
52
51
```bash
53
-
# Run everything: dupRadar + featureCounts + all 7 RSeQC tools
0 commit comments