Skip to content
sprokopec edited this page Jun 25, 2021 · 30 revisions

PughLab pipeline-suite

Installation and Dependencies

This is a collection of pipelines to be used for NGS (both DNA and RNA) analyses, from alignment to variant calling.

Start by creating a clone of the repository:

cd /path/to/some/directory
git clone https://github.com/pughlab/pipeline-suite/

Additionally, the report generation portion of this tool requires installation of the BPG plotting package for R: https://CRAN.R-project.org/package=BoutrosLab.plotting.general

Overview

Configuration

The pipeline-suite runs using parameters provided in YAML format. There are two types of required configuration files:


For tool-specific details for RNA-Seq configuration, click here: RNA-Seq.

For tool-specific details for DNA-Seq configuration, click here: DNA-Seq.

Running a pipeline

  1. Run FASTQC to verify fastq quality:
module load perl

perl /path/to/collect_fastqc_metrics.pl \
-d /path/to/fastq_config.yaml \
-t /path/to/fastqc_tool_config.yaml \
-c slurm \
{optional: --rna, --dry-run }

Be sure to run FASTQC to verify fastq quality prior to running downstream pipelines. In particular, ensure read length is consistent, GC content is similar (typically between 40-60%) and files are unique (no duplicated md5sums):

  1. Prepare interval files (ie, for WXS): For WXS or targeted-sequencing panels, a bed file containing target regions should be provided (listing at minimum: chromosome, start and end positions). Variant calling pipelines MuTect and Mutect2 will add 100bp of padding to each region provided. For consistency, this padding must be manually added prior to variant calling with other tools (ie, Strelka, SomaticSniper, VarDict and VarScan). This function will additionally create a bgzipped version of the padded interval file required by Strelka.
module load perl

perl /path/to/format_interval_bed.pl \
-b /path/to/base/intervals.bed \
-r /path/to/reference.fa

Make sure you have write permissions on the directory containing the intervals bed file as this will write output files to the same directory as the original bed file!

  1. Run DNA (or RNA) pipeline:
module load perl

perl /path/to/pughlab_dnaseq_pipeline.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/dna_fastq_config.yaml \
--preprocessing \
--variant_calling \
--create_report \
-c slurm

This will generate the directory structure in the output directory (provided in /path/to/dna_pipeline_config.yaml), including a "logs/run_DNA_pipeline_TIMESTAMP/" directory containing a file "run_DNASeq_pipeline.log" which lists the individual tool commands; these can be run separately if "--dry-run" is set, or in the event of a failure at any stage and you don't need to re-run the entire thing (Note: doing so would not regenerate files that already exist).

Clone this wiki locally