wf-module-dorado

Snakemake workflow module to run ONT Dorado basecalling, demultiplexing and post-processing for experiments organized as one directory per experiment.

This repository provides a reproducible Snakemake pipeline that is primarily intended to be consumed by other workflows as a module (although it can also run independently):

downloads the Dorado basecaller,
runs Dorado basecalling across experiment POD5 files,
optionally demultiplexes multiplexed runs,
converts basecalled BAMs to FASTQ,
runs pychopper for unstranded kits,
organizes outputs per-experiment and per-sample under basecall/ and samples/ respectively.

Requirements

Snakemake (>=6 recommended)
A GPU-equipped machine for Dorado basecalling.

These rules require minimal resources and can be speficied as localrules

localrules: run_all, rename_final_stranded_fastq, get_dorado, demux_get_bam,
            pychopper_merge_trimmed_rescued, get_basecalled_bam_for_sample

How to use in other workflows

In the consuming workflow add a section in the config that includes all required parameters included in this workflow config file.

In the consuming config.yml:

# Sample data
DORADO: {
  DOWNLOADS_DIR: "downloads",
  EXP_DIR: "experiments",
  BASECALL_DIR: "basecall",
  SAMPLES_DIR: "samples",
  DELETE_INTERMEDIATES: False,
  BIN_VERSION: 'dorado-1.1.1-linux-x64',
  DORADO_RESOURCES: {
    gpu: 2,
    gpu_model: "[gpua100|gpuv100x]",
  },
  DEFAULT_MODEL: 'hac',
  SAMPLE_DATA: {
    'header': ['sample_id', 'experiment_id', 'kit', 'stranded', 'barcode'],
    'data': [
        ['sample1', 'exp1', 'SQK-LSK114', 'false', ''],
    ]
  }
}

Then in a consuming snakefile:

module dorado_basecall:
    snakefile:
        github("maragkakislab/wf-module-dorado", path="workflow/Snakefile", tag="v1.0.3")
    config:
        config["DORADO"]

use rule * from dorado_basecall as dorado_*

rule run_all:
    input:
        # Basecalled BAM
        SAMPLES_DIR + "/test1/basecall/calls.bam",
        # Basecalled and stranded FASTQ
        SAMPLES_DIR + "/test1/fastq/reads.final.fastq.gz",

Configuration

Important keys found in config/config.yml (examples):

DOWNLOADS_DIR - where Dorado tarballs / binaries are downloaded (default downloads).
EXP_DIR - directory containing one subdirectory per experiment. Each experiment directory should contain POD5 files and the EXP_DIR_TRIGGER_FILE (default origin.txt) which acts as a trigger for the workflow.
BASECALL_DIR - where per-experiment basecalling outputs are stored.
SAMPLES_DIR - where per-sample outputs are placed.
BIN_VERSION - Dorado binary version to use.
DELETE_INTERMEDIATES - Delete intermediate files if True (Default: False)
DORADO_RESOURCES - Dorado default GPU resources (Optional)
DEFAULT_MODEL, CUSTOM_MODELS - model selection used in basecall rule.
SAMPLE_DATA - table describing samples; the workflow/rules/common.smk parses this into samples and experiments objects. If multiple samples share the same experiment_id they are assumed as multiplexed and a unique barcode needs to be provided.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
config		config
workflow		workflow
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wf-module-dorado

Requirements

How to use in other workflows

Configuration

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wf-module-dorado

Requirements

How to use in other workflows

Configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages