Skip to content

Commit bcbfcc6

Browse files
committed
Add TODO for generating large Qualimap reference output
1 parent 4cd211b commit bcbfcc6

File tree

1 file changed

+76
-0
lines changed

1 file changed

+76
-0
lines changed

TODO-run-qualimap.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# TODO: Generate large Qualimap reference output
2+
3+
## Problem
4+
5+
The large Qualimap reference output (`benchmark/qualimap/large/`) is missing.
6+
Qualimap's RNA-seq mode name-sorts the entire BAM file, which requires ~50GB of
7+
temporary disk space for the 10GB GM12878 BAM (185M reads). This filled the disk
8+
during the first attempt.
9+
10+
## Prerequisites
11+
12+
- ~60GB free disk space
13+
- Docker installed
14+
- The Qualimap container: `quay.io/biocontainers/qualimap:2.3--hdfd78af_0`
15+
16+
## Steps
17+
18+
1. **Decompress the GTF** (Qualimap can't read gzipped GTF):
19+
20+
```bash
21+
gunzip -k benchmark/input/large/genes.gtf.gz
22+
```
23+
24+
2. **Run Qualimap**:
25+
26+
```bash
27+
mkdir -p benchmark/qualimap/large
28+
docker run --rm \
29+
-v "$(pwd)/benchmark:/data" \
30+
quay.io/biocontainers/qualimap:2.3--hdfd78af_0 \
31+
qualimap rnaseq \
32+
-bam /data/input/large/GM12878_REP1.markdup.sorted.bam \
33+
-gtf /data/input/large/genes.gtf \
34+
-outdir /data/qualimap_tmp_large \
35+
--java-mem-size=8G \
36+
-pe
37+
```
38+
39+
This will take a long time (1-2 hours). The name-sort phase is the bottleneck.
40+
41+
3. **Copy the key output files**:
42+
43+
```bash
44+
cp benchmark/qualimap_tmp_large/rnaseq_qc_results.txt benchmark/qualimap/large/
45+
cp "benchmark/qualimap_tmp_large/raw_data_qualimapReport/coverage_profile_along_genes_(total).txt" benchmark/qualimap/large/
46+
```
47+
48+
4. **Clean up**:
49+
50+
```bash
51+
rm -rf benchmark/qualimap_tmp_large
52+
rm -f benchmark/input/large/genes.gtf
53+
```
54+
55+
5. **Verify** — compare with RustQC output:
56+
57+
```bash
58+
diff benchmark/qualimap/large/rnaseq_qc_results.txt \
59+
benchmark/RustQC/large/qualimap/rnaseq_qc_results.txt
60+
diff benchmark/qualimap/large/coverage_profile_along_genes_\(total\).txt \
61+
benchmark/RustQC/large/qualimap/coverage_profile_along_genes_\(total\).txt
62+
```
63+
64+
6. **Commit**:
65+
66+
```bash
67+
git add benchmark/qualimap/large/
68+
git commit -m "Add large Qualimap reference output"
69+
git push
70+
```
71+
72+
7. **Delete this file** once done.
73+
74+
## Small dataset reference
75+
76+
Already generated and committed at `benchmark/qualimap/small/`.

0 commit comments

Comments
 (0)