Nextstrain repository for Mycobacterium tuberculosis

This workflow performs the following analysis:

Fetches metadata for M. tuberculosis samples with Illumina shotgun sequence data from NCBI SRA using DuckDB CLI
Subsamples the metadata across time and geography
Downloads fastq files for subsampled metadata from NCBI SRA using fasterq-dump
Assigns lineages and identifies drug resistance variants for each sample using tb-profiler
Creates a multi-sample fasta alignment using snippy with low-confidence regions masked following Marin et al. 2022
Creates a multi-sample VCF of informative sites using a custom script
Performs phylogenetic reconstruction using IQTREE

The results of running this workflow are publicly visible at nextstrain.org/tb/global.

Installation

This workflow requires installation of the Nextstrain CLI and Docker.

Usage:

NOTE: Running this workflow will most likely require more compute resources than what is available on your local computer.

nextstrain build --image ghcr.io/nextstrain/tb:latest .

Storing tbprofiler and snippy results

For SRA samples that have already been analyzed in previous runs of this workflow, results of tb-profiler and snippy analyses are stored in an S3 bucket:

tb-profiler

s3://nextstrain-data/files/workflows/tb/data/tbprofiler/results/{sample}.results.json.zst

snippy

s3://nextstrain-data/files/workflows/tb/data/snippy/{sample}/snps.aligned.fa.zst
s3://nextstrain-data/files/workflows/tb/data/snippy/{sample}/snps.vcf.zst

These results files should be deleted from the S3 bucket if changes are made to the workflow that would influence the files, such as changes to the parameters used in the tb-profiler or snippy analysis steps, updates to the tb-profiler or snippy installations, or addition of new sequence quality filtering steps prior to running tb-profiler or snippy.

Repo history

The current Nextstrain github repo differs substantially from the original version of the repo.

The original version was created to perform phylogenetic analyses for a subset of the data from Lee et al. 2015, but with geographic location randomized for each sample. The code and VCF file for that workflow are still available in a separate github repo. That repo is used in the Nextstrain tutorial for creating a phylogenetic workflow with VCF input.

In addition, a phylogenetic tree was previously available at nextstrain.org/tb/global that was generated using a separate workflow and a different dataset which included global tb sequences. The code for that analysis is no longer available, but the tree is still available on Nextstrain.org.

One of the main differences of the current workflow compared to the original workflow is that it starts from raw sequence data from the NCBI SRA rather than starting from a VCF file. This necessitates extra steps in the workflow, including:

Ingest sequence data from NCBI SRA
Perform genotyping using snippy
Create a VCF file for phylogenetic analysis

Other major differences include:

Assign lineages and identify drug resistance variants using tb-profiler
Automate all analyses to enable continually updated global genomic surveillance

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
defaults		defaults
docker		docker
example_data		example_data
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nextstrain repository for Mycobacterium tuberculosis

Installation

Usage:

Storing tbprofiler and snippy results

tb-profiler

snippy

Repo history

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 10

Uh oh!

Languages

nextstrain/tb

Folders and files

Latest commit

History

Repository files navigation

Nextstrain repository for Mycobacterium tuberculosis

Installation

Usage:

Storing tbprofiler and snippy results

tb-profiler

snippy

Repo history

About

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 10

Uh oh!

Languages

Packages