-
Notifications
You must be signed in to change notification settings - Fork 0
AMR Analysis
This example shows both local and Terra based runs using the example sequence and additional information. The sequence is a Escherichia coli genome from CAMRA, located in NCBI. The runs use the hAMRiniztion image built using the Dockerfile in the github. It can also be pulled from DockerHub:
"docker pull thclarke/harmonization:latest"
HAMRinization can be run both on Terra, using the workflow below or locally using Cromwell. In both cases, we recommend troubleshooting using the input files provided and compare the output provided.
Below is the workflow to analyze the example data on Terra from loading the data and metadata to running the complete AMR analysis.
First, to run sign in or sign-up for Terra.bio, click on the bars in the top left, and then select the sign-in (red box)

Select Project and workspace, and you are ready to upload the metadata and assembly information.
First upload the test assembly and then the metadata in table (see below). Select the load tsv button (red box) to upload the tab separated file with the data (below):
| entity:test_id | Assembly | Organism | Genus | Species |
|---|---|---|---|---|
| SAMN45204645 | GCF_048590205.1_ASM4859020v1_genomic.fasta.gz | Escherichia coli | Escherichia | coli |

The WDL from the git is in dockstore. It can be found by searching WDLs for CAMRA and selecting CAMRA-AMR_Detection-WDL:

You should then upload the WDL from dockstore to Terra by selecting Terra on the left in the red box:

The WDL is then in Terra Workflows (as shown in the red box):

It can be selected and submitted using the variables in the table using the the variables in sample JSON in the submission page:

Multiple AMR programs will be run and then combined using the hARmonizer. The results are in the table and can be downloaded. These results can be compared to the final files in the git] including
- uses Docker as a container for the different AMR programs. The Docker program is required to run on Cromwell as well.
- can use Cromwell from the Broad Institute as a wrapper to run the hAMRinzator.
- uses the WDL workflow available in this git.
A description using example input data including a cAMRAh fasta assembly and a JSON file both are which are used in the cromwell run
An example using the full AMR workflow:
-
Download the latest cromwell jar file)
-
Create a directory to store the cromwell output. The output will be stored in: DIR/cromwell-executions/TASK_NAME where DIR is the cromwell output
-
Download the CARD databases and unzip the files using bunzip2: A. Homolog and Variance data (https://card.mcmaster.ca/download/6/prevalence-v4.0.2.tar.bz2)
- nucleotide_fasta_protein_variant_model_variants.fasta
- protein_fasta_protein_variant_model_variants.fasta
- nucleotide_fasta_protein_homolog_model_variants.fasta
- protein_fasta_protein_homolog_model_variants.fasta
B. ARO database (https://card.mcmaster.ca/download/5/ontology-v4.0.1.tar.bz2)
- aro.obo
Once the two databases are downloaded and unzipped, the resulting directory will look like this: 
- Download the input files
- Set the current directory to the Cromwell output directory and run Cromwell using the workflow from Git and the example json file.
cd $OUTPUT_DIR && java -jar cromwell-90.jar run WORKFLOW.wdl -i harm-SAMN45204645.json
To run the HAMRinization Docker, you need to run all the other AMRs and then run the Harmonizor. Thus, we recommend running cromwell which will automatically run all the additional AMR programs.
The Harmonizor help gives the files needed. The hamronize_output_file is the output from the the summerize command.
docker run -v ~/Documents/AMR_Test/:/seq/ thclarke/harmonization python3 /opt/AMR_Term_Consolidation/AMR_Term_Consolidation.py -h usage: AMR_Term_Consolidation.py [-h] hamronize_output_file ontology_file assembly_file database_prot_homolog_file database_prot_variant_file database_nucl_homolog_file database_nucl_variant_file
This will print out the output to the hamronize_output_file.
The results are two files: HARMONIZED_TERMS.tsv and CONSOLIDATED_TERMS.tsv that can be compared to the files in the example file
The CONSOLIDATED terms are in tabular format, with each gene locus as a row. This file is a summary of the Harmonized file. The columns of the table have information showing:
- each genomic locus including the starting base pair and the stopping base pair and the contig from which it derives (columns 1-3).
- The evidence for the locus being AMR, including whether from one or multiple files (column 4), the number of matches (column 5), and the matches with a matching or similar name (column 6)
- The results from blastx match to the locus including the percent identity of the match (column 7), the gene name from the match (column 8), where the match derives from (column 9), and the reference number of the match (column 10). In cases where the match was to the CARD database, an Antibiotic Resistance (ARO) term is given.
- Additional CARD information when the locus maps to a CARD gene, including the type of resistance given by the gene (column 11), fuller information on the definition (column 12), any parental ARO terms (column 13), and any homologous terms (column 15).
| start | stop | contig | match_type | hits | agreeing_hits | pident | gene_name | database | ref_accession | name_space | definition |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 775 | NZ_JBMAZT010000086.1 | Multiple hits | 5 | 4 | 100 | Detection_Model:Protein | CARD | ARO:3004089 | antibiotic_resistance | "ANT(3'')-IIa is a aminoglycoside nucleotidyltransferase identified in Acinetobacter spp. via horizontal gene transfer mechanisms." [PMID:28152054] |
| 69 | 593 | NZ_JBMAZT010000041.1 | Multiple hits | 4 | 4 | 100 | Detection_Model:Protein | CARD | ARO:3002895 | antibiotic_resistance | "SAT-2 is a plasmid-mediated streptothricin acetyltransferase, which confers resistance to streptothricin, a nucleoside antibiotic. Originally described from an E. coli plasmid sequence by Heim et al., 1989." [PMID:2157196, PMID:2550905] |
| 2243 | 3043 | NZ_JBMAZT010000046.1 | Multiple hits | 5 | 5 | 100 | aph(3'')-Ib | CARD | ARO:3002639 | antibiotic_resistance | "APH(3'')-Ib is an aminoglycoside phosphotransferase encoded by plasmids, transposons, integrative conjugative elements and chromosomes in Enterobacteriaceae and Pseudomonas spp." [PMID:2653965] |
| 3046 | 3882 | NZ_JBMAZT010000046.1 | Multiple hits | 5 | 5 | 100 | Detection_Model:Protein | CARD | ARO:3002660 | antibiotic_resistance | "APH(6)-Id is an aminoglycoside phosphotransferase encoded by plasmids, integrative conjugative elements and chromosomal genomic islands in K. pneumoniae, Salmonella spp., E. coli, Shigella flexneri, Providencia alcalifaciens, Pseudomonas spp., V. cholerae, Edwardsiella tarda, Pasteurella multocida and Aeromonas bestiarum." [PMID:2653965, PMID:18413319, PMID:19465049, PMID:15722395] |
| 85554 | 86858 | NZ_JBMAZT010000005.1 | Single Hit | 1 | 1 | 98.16 | (Bla)AmpC1_Ecoli | argannot | FN649414:2765051-2766355 |
The HARMONIZED terms are in tabular format, with each gene locus from each AMR program as a row. Thus is is possible for a locus to be represented multiple times. The columns of the table have information showing:
-
The evidence for the locus being AMR, inlcuding the group that the match is part of (column 1) where overlapping matches are clustered together; the program from which it was found and match in an array format (column 2), the gene name from the match (column 3, the percent match (column 4), and the reference of the match (column 5).
-
The results from CARD db blastx match to the locus including the type of match (blastx if the gene is a nucleotide sequence, and blastp if it is a protein), the percent identity of the match (column 7), the ARO term from the match (column 8), and the present id of the match (column 9)
-
Finally, there is additional information about the genomic context of the match, including the start of the match (column 10), the end of the match (column 11) and the contig where the match derives from (column 12)
| loci_groups | permutations | gene_symbol | sequence_identity | reference_accession | card_match_type | card_match_name | card_match_id | pident | input_gene_start | input_gene_stop | input_sequence_id |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ('abricate', 'argannot') | (AGly)aadA1-pm | 99.49 | JQ690540:7968-8798 | card_blastx_homolog | Detection_Model:Protein | ARO:3004089 | 100 | 1 | 778 | NZ_JBMAZT010000086.1 |
| 1 | ('abricate', 'argannot') | (AGly)sat-2A | 100 | X51546:518-1042 | card_blastp_homolog | Detection_Model:Protein | ARO:3002895 | 100 | 69 | 593 | NZ_JBMAZT010000041.1 |
| 2 | ('abricate', 'argannot') | (AGly)strA | 99.88 | AB366441:22458-23261 | card_blastp_homolog | Detection_Model:Protein | ARO:3002639 | 100 | 2243 | 3046 | NZ_JBMAZT010000046.1 |
| 3 | ('abricate', 'argannot') | (AGly)strB | 100 | FJ474091:264-1100 | card_blastp_homolog | Detection_Model:Protein | ARO:3002660 | 100 | 3046 | 3882 | NZ_JBMAZT010000046.1 |
| 4 | ('abricate', 'argannot') | (Bla)AmpC1_Ecoli | 98.16 | FN649414:2765051-2766355 | 85554 | 86858 | NZ_JBMAZT010000005.1 |