Skip to content

AMR Analysis

thclarke edited this page Dec 2, 2025 · 20 revisions

Example Usage of the hAMRinization pipeline

Table of Contents

  1. Introduction
    1. About the Examples
  2. Usage
    1. Terra.bio
      1. Setup
      2. Upload data
      3. Running hARMinizor
      4. Output Files
    2. Local Run
      1. Cromwell Run
        1. Input files
          1. JSON
          2. FASTA
      2. Docker Run
    3. Output
      1. CONSOLIDATED_TERMS.tsv
      2. HARMONIZED_TERMS.tsv

Introduction

About the Examples

This example shows both local and Terra based runs using the example sequence and additional information. The sequence is a Escherichia coli genome from CAMRA, located in NCBI. The runs use the hAMRiniztion image built using the Dockerfile in the github. It can also be pulled from DockerHub:

"docker pull thclarke/harmonization:latest"

HAMRinization can be run both on Terra, using the workflow below or locally using Cromwell. In both cases, we recommend troubleshooting using the input files provided and compare the output provided.

Usage

Terra.bio

Below is the workflow to analyze the example data on Terra from loading the data and metadata to running the complete AMR analysis.

Setup

First, to run sign in or sign-up for Terra.bio, click on the bars in the top left, and then select the sign-in (red box) Signup

Select Project and workspace, and you are ready to upload the metadata and assembly information.

Upload data

First upload the test assembly and then the metadata in table (see below). Select the load tsv button (red box) to upload the tab separated file with the data (below):

entity:test_id Assembly Organism Genus Species
SAMN45204645 GCF_048590205.1_ASM4859020v1_genomic.fasta.gz Escherichia coli Escherichia coli

Upload

Running hARMinizor

The WDL from the git is in dockstore. It can be found by searching WDLs for CAMRA and selecting CAMRA-AMR_Detection-WDL:

Dockstore-search

You should then upload the WDL from dockstore to Terra by selecting Terra on the left in the red box:

Dockstore-search

The WDL is then in Terra Workflows (as shown in the red box):

Workflow

It can be selected and submitted using the variables in the table using the the variables in sample JSON in the submission page:

Run_Wdl

Output Files

Multiple AMR programs will be run and then combined using the hARmonizer. The results are in the table and can be downloaded. These results can be compared to the final files in the git] including

Local Run

  • uses Docker as a container for the different AMR programs. The Docker program is required to run on Cromwell as well.
  • can use Cromwell from the Broad Institute as a wrapper to run the hAMRinzator.
  • uses the WDL workflow available in this git.

A description using example input data including a cAMRAh fasta assembly and a JSON file both are which are used in the cromwell run

Cromwell Run

An example using the full AMR workflow:

Once the two databases are downloaded and unzipped, the resulting directory will look like this: card_databases

  • Download the input files
    • JSON and
    • FASTA files from the git
    • edit the JSON to add your BV-BRC username and password and the location of the example files and the CARD downloads listed above.
  • Set the current directory to the Cromwell output directory and run Cromwell using the workflow from Git and the example json file.

cd $OUTPUT_DIR && java -jar cromwell-90.jar run WORKFLOW.wdl -i harm-SAMN45204645.json

Docker Run

To run the HAMRinization Docker, you need to run all the other AMRs and then run the Harmonizor. Thus, we recommend running cromwell which will automatically run all the additional AMR programs.

The Harmonizor help gives the files needed. The hamronize_output_file is the output from the the summerize command.


docker run -v ~/Documents/AMR_Test/:/seq/ thclarke/harmonization python3 /opt/AMR_Term_Consolidation/AMR_Term_Consolidation.py -h usage: AMR_Term_Consolidation.py [-h] hamronize_output_file ontology_file assembly_file database_prot_homolog_file database_prot_variant_file database_nucl_homolog_file database_nucl_variant_file


This will print out the output to the hamronize_output_file.

Output

The results are two files: HARMONIZED_TERMS.tsv and CONSOLIDATED_TERMS.tsv that can be compared to the files in the example file

CONSOLIDATED_TERMS.tsv

The CONSOLIDATED terms are in tabular format, with each gene locus as a row. This file is a summary of the Harmonized file. The columns of the table have information showing:

  • each genomic locus including the starting base pair and the stopping base pair and the contig from which it derives (columns 1-3).
  • The evidence for the locus being AMR, including whether from one or multiple files (column 4), the number of matches (column 5), and the matches with a matching or similar name (column 6)
  • The results from blastx match to the locus including the percent identity of the match (column 7), the gene name from the match (column 8), where the match derives from (column 9), and the reference number of the match (column 10). In cases where the match was to the CARD database, an Antibiotic Resistance (ARO) term is given.
  • Additional CARD information when the locus maps to a CARD gene, including the type of resistance given by the gene (column 11), fuller information on the definition (column 12), any parental ARO terms (column 13), and any homologous terms (column 15).
start stop contig match_type hits agreeing_hits pident gene_name database ref_accession name_space definition
2 775 NZ_JBMAZT010000086.1 Multiple hits 5 4 100 Detection_Model:Protein CARD ARO:3004089 antibiotic_resistance "ANT(3'')-IIa is a aminoglycoside nucleotidyltransferase identified in Acinetobacter spp. via horizontal gene transfer mechanisms." [PMID:28152054]
69 593 NZ_JBMAZT010000041.1 Multiple hits 4 4 100 Detection_Model:Protein CARD ARO:3002895 antibiotic_resistance "SAT-2 is a plasmid-mediated streptothricin acetyltransferase, which confers resistance to streptothricin, a nucleoside antibiotic. Originally described from an E. coli plasmid sequence by Heim et al., 1989." [PMID:2157196, PMID:2550905]
2243 3043 NZ_JBMAZT010000046.1 Multiple hits 5 5 100 aph(3'')-Ib CARD ARO:3002639 antibiotic_resistance "APH(3'')-Ib is an aminoglycoside phosphotransferase encoded by plasmids, transposons, integrative conjugative elements and chromosomes in Enterobacteriaceae and Pseudomonas spp." [PMID:2653965]
3046 3882 NZ_JBMAZT010000046.1 Multiple hits 5 5 100 Detection_Model:Protein CARD ARO:3002660 antibiotic_resistance "APH(6)-Id is an aminoglycoside phosphotransferase encoded by plasmids, integrative conjugative elements and chromosomal genomic islands in K. pneumoniae, Salmonella spp., E. coli, Shigella flexneri, Providencia alcalifaciens, Pseudomonas spp., V. cholerae, Edwardsiella tarda, Pasteurella multocida and Aeromonas bestiarum." [PMID:2653965, PMID:18413319, PMID:19465049, PMID:15722395]
85554 86858 NZ_JBMAZT010000005.1 Single Hit 1 1 98.16 (Bla)AmpC1_Ecoli argannot FN649414:2765051-2766355

HARMONIZED_TERMS.tsv

The HARMONIZED terms are in tabular format, with each gene locus from each AMR program as a row. Thus is is possible for a locus to be represented multiple times. The columns of the table have information showing:

  • The evidence for the locus being AMR, inlcuding the group that the match is part of (column 1) where overlapping matches are clustered together; the program from which it was found and match in an array format (column 2), the gene name from the match (column 3, the percent match (column 4), and the reference of the match (column 5).

  • The results from CARD db blastx match to the locus including the type of match (blastx if the gene is a nucleotide sequence, and blastp if it is a protein), the percent identity of the match (column 7), the ARO term from the match (column 8), and the present id of the match (column 9)

  • Finally, there is additional information about the genomic context of the match, including the start of the match (column 10), the end of the match (column 11) and the contig where the match derives from (column 12)

loci_groups permutations gene_symbol sequence_identity reference_accession card_match_type card_match_name card_match_id pident input_gene_start input_gene_stop input_sequence_id
0 ('abricate', 'argannot') (AGly)aadA1-pm 99.49 JQ690540:7968-8798 card_blastx_homolog Detection_Model:Protein ARO:3004089 100 1 778 NZ_JBMAZT010000086.1
1 ('abricate', 'argannot') (AGly)sat-2A 100 X51546:518-1042 card_blastp_homolog Detection_Model:Protein ARO:3002895 100 69 593 NZ_JBMAZT010000041.1
2 ('abricate', 'argannot') (AGly)strA 99.88 AB366441:22458-23261 card_blastp_homolog Detection_Model:Protein ARO:3002639 100 2243 3046 NZ_JBMAZT010000046.1
3 ('abricate', 'argannot') (AGly)strB 100 FJ474091:264-1100 card_blastp_homolog Detection_Model:Protein ARO:3002660 100 3046 3882 NZ_JBMAZT010000046.1
4 ('abricate', 'argannot') (Bla)AmpC1_Ecoli 98.16 FN649414:2765051-2766355     85554 86858 NZ_JBMAZT010000005.1

Clone this wiki locally