Metadata for User facility Template Transformations (MUTTs)

Introduction

The programs bundled in this repository automatically retrieve Biosample metadata records for studies submitted to NMDC through the NMDC Submission Portal, and convert the metadata into Excel spreadsheets that are accepted by DOE user facilities.

MUTTs User Documentation

The documentation and setup instructions in this section are meant for any user who would like to install the MUTTs Python package and use it's transformation capabilities to convert data from the NMDC Submission Portal into an Excel spreadsheet that follows a template, based on the MUTTs JSON mapper file that is used.

Prerequisites

Python 3.12 or higher
An NMDC user account with an API access token

To create an NMDC user account you will need to sign up at the above link by clicking on the 'ORCID LOGIN' button/link at the top right corner of the NMDC site, and signing in appropriately with your ORCID credentials

Setting up your API access token

This is required for running the examples in the Usage section below (after going through all the Installation steps).

Create a .env file in your working directory with the following environment variables:

echo "DATA_PORTAL_REFRESH_TOKEN=your_token_here" > .env
echo "SUBMISSION_PORTAL_BASE_URL=https://data.microbiomedata.org" >> .env

To get your access token:

Visit https://data.microbiomedata.org/user
Copy your Refresh Token
Replace your_token_here in the .env file with your token

Installation

Create a virtual environment (recommended)

python -m venv mutts-env
source mutts-env/bin/activate  # On Windows: mutts-env\Scripts\activate

Install the MUTTs package from PyPI

pip install mutts

Download any of the MUTTs JSON mapper configuration files

Note: It is not mandatory that you need to download/use any of the pre-existing/already defined JSON mapper files that are present in this repository. You can always define your own custom JSON mapper files that follow a format similar to the ones defined in this repo.

Create a directory for your mapper files and download them from this repository:

mkdir input-files
cd input-files

Download the mapper files you need from the input-files directory:

For EMSL: emsl_header.json
For JGI Metagenome: jgi_mg_header.json or jgi_mg_header_v15.json
For JGI Metatranscriptome: jgi_mt_header.json or jgi_mt_header_v15.json

Updating to the Latest Version

To ensure you have the latest features and bug fixes, you can upgrade the MUTTs package from PyPI:

pip install --upgrade mutts

To check your currently installed version:

pip show mutts

You can also install a specific version if needed:

pip install mutts==<version>

Usage

Run the mutts command with the required options:

mutts --help

Note: In the below examples there is a --submission optional argument that requires you to pass it an NMDC Submission UUID as value, and the way you would get that is from the URL of the Submission page when you open it up from the Submission Portal.

An example would look like below:

https://data.microbiomedata.org/submission/<submission-uuid>/samples

Example 1: Generate a JGI Metagenome spreadsheet

mutts --submission <submission-uuid> \
      --unique-field samp_name \
      --user-facility jgi_mg \
      --mapper input-files/jgi_mg_header.json \
      --output my-samples_jgi.xlsx

Example 2: Generate a JGI Metagenome v15 spreadsheet

mutts --submission <submission-uuid> \
      --unique-field samp_name \
      --user-facility jgi_mg \
      --mapper input-files/jgi_mg_header_v15.json \
      --output my-samples_jgi_v15.xlsx

Example 3: Generate an EMSL spreadsheet

mutts --submission <submission-uuid> \
      --user-facility emsl \
      --mapper input-files/emsl_header.json \
      --header \
      --unique-field samp_name \
      --output my-samples_emsl.xlsx

Command Options

-s, --submission: Your NMDC metadata submission UUID (required)
-u, --user-facility: Target facility (required): emsl, jgi_mg, jgi_mg_lr, or jgi_mt
-m, --mapper: Path to the JSON mapper file (required)
-uf, --unique-field: Field to uniquely identify records (required, typically samp_name)
-o, --output: Output Excel file path (required)
-h, --header: Include headers in output (use for EMSL, omit for JGI)

MUTTs Developer Documentation

The documentation and setup instructions in this section are largely meant for any developer/programmer whose primary use case is to extend/improve/build upon the current capabilities of the MUTTs software.

The software consists of two main components:

JSON Mapper Configuration Files

Controls/specifies the mapping between columns from the NMDC Submission Portal and column names used in the output spreadsheets
Top-level keys indicate main headers in the output
Numbered keys add clarifying header information
The header keyword allows custom column names
The sub_port_mapping keyword specifies mappings between Submission Portal columns/slots (as dictated by the NMDC submission schema) and user facility template columns
Examples available in input-files/

mutts CLI

Command-line application that performs the metadata conversion
Consumes mapper files and submission data as inputs

Software Requirements

Poetry
Python 3.12 or higher

Development Installation

Clone this repository

git clone https://github.com/microbiomedata/metadata-for-user-facility-template-transformations.git
cd metadata-for-user-facility-template-transformations

Install dependencies with Poetry

poetry install

This installs the mutts package in development mode and creates the mutts command-line tool.

Set up your .env file

cp .env.example .env  # if available, or create a new .env file

Add your NMDC API token and submission portal base URL:

DATA_PORTAL_REFRESH_TOKEN=your_token_here
SUBMISSION_PORTAL_BASE_URL=https://data.microbiomedata.org

Get your token from: https://data.microbiomedata.org/user

Run the CLI in development mode

poetry run mutts --help

Creating Custom Mapper Files

To create a custom mapper for a new user facility, refer to the existing examples:

emsl_header.json - EMSL configuration
jgi_mg_header.json - JGI Metagenome configuration
jgi_mt_header.json - JGI Metatranscriptome configuration
jgi_mg_header_v15.json - JGI Metagenome v15 configuration
jgi_mt_header_v15.json - JGI Metatranscriptome v15 configuration

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.github/workflows		.github/workflows
example-files		example-files
input-files		input-files
src/mutts		src/mutts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadata for User facility Template Transformations (MUTTs)

Table of Contents

Introduction

MUTTs User Documentation

Prerequisites

Installation

Updating to the Latest Version

Usage

Example 1: Generate a JGI Metagenome spreadsheet

Example 2: Generate a JGI Metagenome v15 spreadsheet

Example 3: Generate an EMSL spreadsheet

Command Options

MUTTs Developer Documentation

Software Requirements

Development Installation

Creating Custom Mapper Files

About

Uh oh!

Releases 8

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Metadata for User facility Template Transformations (MUTTs)

Table of Contents

Introduction

MUTTs User Documentation

Prerequisites

Installation

Updating to the Latest Version

Usage

Example 1: Generate a JGI Metagenome spreadsheet

Example 2: Generate a JGI Metagenome v15 spreadsheet

Example 3: Generate an EMSL spreadsheet

Command Options

MUTTs Developer Documentation

Software Requirements

Development Installation

Creating Custom Mapper Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 8

Uh oh!

Contributors

Uh oh!

Languages