|
| 1 | +# TODO: Generate large Qualimap reference output |
| 2 | + |
| 3 | +## Problem |
| 4 | + |
| 5 | +The large Qualimap reference output (`benchmark/qualimap/large/`) is missing. |
| 6 | +Qualimap's RNA-seq mode name-sorts the entire BAM file, which requires ~50GB of |
| 7 | +temporary disk space for the 10GB GM12878 BAM (185M reads). This filled the disk |
| 8 | +during the first attempt. |
| 9 | + |
| 10 | +## Prerequisites |
| 11 | + |
| 12 | +- ~60GB free disk space |
| 13 | +- Docker installed |
| 14 | +- The Qualimap container: `quay.io/biocontainers/qualimap:2.3--hdfd78af_0` |
| 15 | + |
| 16 | +## Steps |
| 17 | + |
| 18 | +1. **Decompress the GTF** (Qualimap can't read gzipped GTF): |
| 19 | + |
| 20 | + ```bash |
| 21 | + gunzip -k benchmark/input/large/genes.gtf.gz |
| 22 | + ``` |
| 23 | + |
| 24 | +2. **Run Qualimap**: |
| 25 | + |
| 26 | + ```bash |
| 27 | + mkdir -p benchmark/qualimap/large |
| 28 | + docker run --rm \ |
| 29 | + -v "$(pwd)/benchmark:/data" \ |
| 30 | + quay.io/biocontainers/qualimap:2.3--hdfd78af_0 \ |
| 31 | + qualimap rnaseq \ |
| 32 | + -bam /data/input/large/GM12878_REP1.markdup.sorted.bam \ |
| 33 | + -gtf /data/input/large/genes.gtf \ |
| 34 | + -outdir /data/qualimap_tmp_large \ |
| 35 | + --java-mem-size=8G \ |
| 36 | + -pe |
| 37 | + ``` |
| 38 | + |
| 39 | + This will take a long time (1-2 hours). The name-sort phase is the bottleneck. |
| 40 | + |
| 41 | +3. **Copy the key output files**: |
| 42 | + |
| 43 | + ```bash |
| 44 | + cp benchmark/qualimap_tmp_large/rnaseq_qc_results.txt benchmark/qualimap/large/ |
| 45 | + cp "benchmark/qualimap_tmp_large/raw_data_qualimapReport/coverage_profile_along_genes_(total).txt" benchmark/qualimap/large/ |
| 46 | + ``` |
| 47 | + |
| 48 | +4. **Clean up**: |
| 49 | + |
| 50 | + ```bash |
| 51 | + rm -rf benchmark/qualimap_tmp_large |
| 52 | + rm -f benchmark/input/large/genes.gtf |
| 53 | + ``` |
| 54 | + |
| 55 | +5. **Verify** — compare with RustQC output: |
| 56 | + |
| 57 | + ```bash |
| 58 | + diff benchmark/qualimap/large/rnaseq_qc_results.txt \ |
| 59 | + benchmark/RustQC/large/qualimap/rnaseq_qc_results.txt |
| 60 | + diff benchmark/qualimap/large/coverage_profile_along_genes_\(total\).txt \ |
| 61 | + benchmark/RustQC/large/qualimap/coverage_profile_along_genes_\(total\).txt |
| 62 | + ``` |
| 63 | + |
| 64 | +6. **Commit**: |
| 65 | + |
| 66 | + ```bash |
| 67 | + git add benchmark/qualimap/large/ |
| 68 | + git commit -m "Add large Qualimap reference output" |
| 69 | + git push |
| 70 | + ``` |
| 71 | + |
| 72 | +7. **Delete this file** once done. |
| 73 | + |
| 74 | +## Small dataset reference |
| 75 | + |
| 76 | +Already generated and committed at `benchmark/qualimap/small/`. |
0 commit comments