AMR Analysis

Example Usage of the hAMRinization pipeline

Introduction

About the Examples

This example shows both local and Terra based runs using the example sequence and additional information. The sequence is a Escherichia coli genome from CAMRA, located in NCBI. The runs use the hAMRiniztion image built using the Dockerfile in the github. It can also be pulled from DockerHub:

"docker pull thclarke/harmonization:latest"

HAMRinization can be run both on Terra, using the workflow below or locally using Cromwell. In both cases, we recommend troubleshooting using the input files provided and compare the output provided.

Usage

Terra.bio

Below is the workflow to analyze the example data on Terra from loading the data and metadata to running the complete AMR analysis.

Setup

First, to run sign in or sign-up for Terra.bio, click on the bars in the top left, and then select the sign-in (red box) Signup

Select Project and workspace, and you are ready to upload the metadata and assembly information.

Upload data

First upload the test assembly and then the metadata in table (see below). Select the load tsv button (red box) to upload the tab separated file with the data (below):

entity:test_id	Assembly	Organism	Genus	Species
SAMN45204645	GCF_048590205.1_ASM4859020v1_genomic.fasta.gz	Escherichia coli	Escherichia	coli

Upload

Running hARMinizor

The WDL from the git is in dockstore. It can be found by searching WDLs for CAMRA and selecting CAMRA-AMR_Detection-WDL:

You should then upload the WDL from dockstore to Terra by selecting Terra on the left in the red box:

Dockstore-search

The WDL is then in Terra Workflows (as shown in the red box):

Workflow

It can be selected and submitted using the variables in the table using the the variables in sample JSON in the submission page:

Run_Wdl

Output Files

Multiple AMR programs will be run and then combined using the hARmonizer. The results are in the table and can be downloaded. These results can be compared to the final files in the git] including

Local Run

uses Docker as a container for the different AMR programs. The Docker program is required to run on Cromwell as well.
can use Cromwell from the Broad Institute as a wrapper to run the hAMRinzator.
uses the WDL workflow available in this git.

A description using example input data including a cAMRAh fasta assembly and a JSON file both are which are used in the cromwell run

Cromwell Run

An example using the full AMR workflow:

Download the latest cromwell jar file)
Create a directory to store the cromwell output. The output will be stored in: DIR/cromwell-executions/TASK_NAME where DIR is the cromwell output
Download the CARD databases and unzip the files using bunzip2: A. Homolog and Variance data (https://card.mcmaster.ca/download/6/prevalence-v4.0.2.tar.bz2)
- nucleotide_fasta_protein_variant_model_variants.fasta
- protein_fasta_protein_variant_model_variants.fasta
- nucleotide_fasta_protein_homolog_model_variants.fasta
- protein_fasta_protein_homolog_model_variants.fasta
B. ARO database (https://card.mcmaster.ca/download/5/ontology-v4.0.1.tar.bz2)
- aro.obo

Once the two databases are downloaded and unzipped, the resulting directory will look like this: card_databases

Download the input files
- JSON and
- FASTA files from the git
- edit the JSON to add your BV-BRC username and password and the location of the example files and the CARD downloads listed above.
Set the current directory to the Cromwell output directory and run Cromwell using the workflow from Git and the example json file.

cd $OUTPUT_DIR && java -jar cromwell-90.jar run WORKFLOW.wdl -i harm-SAMN45204645.json

Docker Run

To run the HAMRinization Docker, you need to run all the other AMRs and then run the Harmonizor. Thus, we recommend running cromwell which will automatically run all the additional AMR programs.

The Harmonizor help gives the files needed. The hamronize_output_file is the output from the the summerize command.

docker run -v ~/Documents/AMR_Test/:/seq/ thclarke/harmonization python3 /opt/AMR_Term_Consolidation/AMR_Term_Consolidation.py -h usage: AMR_Term_Consolidation.py [-h] hamronize_output_file ontology_file assembly_file database_prot_homolog_file database_prot_variant_file database_nucl_homolog_file database_nucl_variant_file

This will print out the output to the hamronize_output_file.

Output

The results are two files: HARMONIZED_TERMS.tsv and CONSOLIDATED_TERMS.tsv that can be compared to the files in the example file

CONSOLIDATED_TERMS.tsv

The CONSOLIDATED terms are in tabular format, with each gene locus as a row. This file is a summary of the Harmonized file. The columns of the table have information showing:

each genomic locus including the starting base pair and the stopping base pair and the contig from which it derives (columns 1-3).
The evidence for the locus being AMR, including whether from one or multiple files (column 4), the number of matches (column 5), and the matches with a matching or similar name (column 6)
The results from blastx match to the locus including the percent identity of the match (column 7), the gene name from the match (column 8), where the match derives from (column 9), and the reference number of the match (column 10). In cases where the match was to the CARD database, an Antibiotic Resistance (ARO) term is given.
Additional CARD information when the locus maps to a CARD gene, including the type of resistance given by the gene (column 11), fuller information on the definition (column 12), any parental ARO terms (column 13), and any homologous terms (column 15).

start	stop	contig	match_type	hits	agreeing_hits	pident	gene_name	database	ref_accession	name_space	definition
2	775	NZ_JBMAZT010000086.1	Multiple hits	5	4	100	Detection_Model:Protein	CARD	ARO:3004089	antibiotic_resistance	"ANT(3'')-IIa is a aminoglycoside nucleotidyltransferase identified in Acinetobacter spp. via horizontal gene transfer mechanisms." [PMID:28152054]
69	593	NZ_JBMAZT010000041.1	Multiple hits	4	4	100	Detection_Model:Protein	CARD	ARO:3002895	antibiotic_resistance	"SAT-2 is a plasmid-mediated streptothricin acetyltransferase, which confers resistance to streptothricin, a nucleoside antibiotic. Originally described from an E. coli plasmid sequence by Heim et al., 1989." [PMID:2157196, PMID:2550905]
2243	3043	NZ_JBMAZT010000046.1	Multiple hits	5	5	100	aph(3'')-Ib	CARD	ARO:3002639	antibiotic_resistance	"APH(3'')-Ib is an aminoglycoside phosphotransferase encoded by plasmids, transposons, integrative conjugative elements and chromosomes in Enterobacteriaceae and Pseudomonas spp." [PMID:2653965]
3046	3882	NZ_JBMAZT010000046.1	Multiple hits	5	5	100	Detection_Model:Protein	CARD	ARO:3002660	antibiotic_resistance	"APH(6)-Id is an aminoglycoside phosphotransferase encoded by plasmids, integrative conjugative elements and chromosomal genomic islands in K. pneumoniae, Salmonella spp., E. coli, Shigella flexneri, Providencia alcalifaciens, Pseudomonas spp., V. cholerae, Edwardsiella tarda, Pasteurella multocida and Aeromonas bestiarum." [PMID:2653965, PMID:18413319, PMID:19465049, PMID:15722395]
85554	86858	NZ_JBMAZT010000005.1	Single Hit	1	1	98.16	(Bla)AmpC1_Ecoli	argannot	FN649414:2765051-2766355

HARMONIZED_TERMS.tsv

The HARMONIZED terms are in tabular format, with each gene locus from each AMR program as a row. Thus is is possible for a locus to be represented multiple times. The columns of the table have information showing:

The evidence for the locus being AMR, inlcuding the group that the match is part of (column 1) where overlapping matches are clustered together; the program from which it was found and match in an array format (column 2), the gene name from the match (column 3, the percent match (column 4), and the reference of the match (column 5).
The results from CARD db blastx match to the locus including the type of match (blastx if the gene is a nucleotide sequence, and blastp if it is a protein), the percent identity of the match (column 7), the ARO term from the match (column 8), and the present id of the match (column 9)
Finally, there is additional information about the genomic context of the match, including the start of the match (column 10), the end of the match (column 11) and the contig where the match derives from (column 12)

loci_groups	permutations	gene_symbol	sequence_identity	reference_accession	card_match_type	card_match_name	card_match_id	pident	input_gene_start	input_gene_stop	input_sequence_id
0	('abricate', 'argannot')	(AGly)aadA1-pm	99.49	JQ690540:7968-8798	card_blastx_homolog	Detection_Model:Protein	ARO:3004089	100	1	778	NZ_JBMAZT010000086.1
1	('abricate', 'argannot')	(AGly)sat-2A	100	X51546:518-1042	card_blastp_homolog	Detection_Model:Protein	ARO:3002895	100	69	593	NZ_JBMAZT010000041.1
2	('abricate', 'argannot')	(AGly)strA	99.88	AB366441:22458-23261	card_blastp_homolog	Detection_Model:Protein	ARO:3002639	100	2243	3046	NZ_JBMAZT010000046.1
3	('abricate', 'argannot')	(AGly)strB	100	FJ474091:264-1100	card_blastp_homolog	Detection_Model:Protein	ARO:3002660	100	3046	3882	NZ_JBMAZT010000046.1
4	('abricate', 'argannot')	(Bla)AmpC1_Ecoli	98.16	FN649414:2765051-2766355			85554	86858	NZ_JBMAZT010000005.1

CAMRA Website | Terra Website

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMR Analysis

Example Usage of the hAMRinization pipeline

Table of Contents

Introduction

About the Examples

Usage

Terra.bio

Setup

Upload data

Running hARMinizor

Output Files

Local Run

Cromwell Run

Docker Run

Output

CONSOLIDATED_TERMS.tsv

HARMONIZED_TERMS.tsv

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally