- Metadata for User facility Template Transformations (MUTTs)
The programs bundled in this repository automatically retrieve Biosample metadata records for studies submitted to NMDC through the NMDC Submission Portal, and convert the metadata into Excel spreadsheets that are accepted by DOE user facilities.
The documentation and setup instructions in this section are meant for any user who would like to install the MUTTs Python package and use it's transformation capabilities to convert data from the NMDC Submission Portal into an Excel spreadsheet that follows a template, based on the MUTTs JSON mapper file that is used.
- Python 3.12 or higher
- An NMDC user account with an API access token
To create an NMDC user account you will need to sign up at the above link by clicking on the 'ORCID LOGIN' button/link at the top right corner of the NMDC site, and signing in appropriately with your ORCID credentials
Setting up your API access token
This is required for running the examples in the Usage section below (after going through all the Installation steps).
Create a .env file in your working directory with the following environment variables:
echo "DATA_PORTAL_REFRESH_TOKEN=your_token_here" > .env
echo "SUBMISSION_PORTAL_BASE_URL=https://data.microbiomedata.org" >> .envTo get your access token:
- Visit https://data.microbiomedata.org/user
- Copy your Refresh Token
- Replace
your_token_herein the.envfile with your token
- Create a virtual environment (recommended)
python -m venv mutts-env
source mutts-env/bin/activate # On Windows: mutts-env\Scripts\activate- Install the MUTTs package from PyPI
pip install mutts- Download any of the MUTTs JSON mapper configuration files
Note: It is not mandatory that you need to download/use any of the pre-existing/already defined JSON mapper files that are present in this repository. You can always define your own custom JSON mapper files that follow a format similar to the ones defined in this repo.
Create a directory for your mapper files and download them from this repository:
mkdir input-files
cd input-filesDownload the mapper files you need from the input-files directory:
- For EMSL:
emsl_header.json - For JGI Metagenome:
jgi_mg_header.jsonorjgi_mg_header_v15.json - For JGI Metatranscriptome:
jgi_mt_header.jsonorjgi_mt_header_v15.json
To ensure you have the latest features and bug fixes, you can upgrade the MUTTs package from PyPI:
pip install --upgrade muttsTo check your currently installed version:
pip show muttsYou can also install a specific version if needed:
pip install mutts==<version>Run the mutts command with the required options:
mutts --helpNote: In the below examples there is a --submission optional argument that requires you to pass it an NMDC Submission UUID as value, and the way you would get that is from the URL of the Submission page when you open it up from the Submission Portal.
An example would look like below:
https://data.microbiomedata.org/submission/<submission-uuid>/samples
mutts --submission <submission-uuid> \
--unique-field samp_name \
--user-facility jgi_mg \
--mapper input-files/jgi_mg_header.json \
--output my-samples_jgi.xlsxmutts --submission <submission-uuid> \
--unique-field samp_name \
--user-facility jgi_mg \
--mapper input-files/jgi_mg_header_v15.json \
--output my-samples_jgi_v15.xlsxmutts --submission <submission-uuid> \
--user-facility emsl \
--mapper input-files/emsl_header.json \
--header \
--unique-field samp_name \
--output my-samples_emsl.xlsx-s, --submission: Your NMDC metadata submission UUID (required)-u, --user-facility: Target facility (required):emsl,jgi_mg,jgi_mg_lr, orjgi_mt-m, --mapper: Path to the JSON mapper file (required)-uf, --unique-field: Field to uniquely identify records (required, typicallysamp_name)-o, --output: Output Excel file path (required)-h, --header: Include headers in output (use for EMSL, omit for JGI)
The documentation and setup instructions in this section are largely meant for any developer/programmer whose primary use case is to extend/improve/build upon the current capabilities of the MUTTs software.
The software consists of two main components:
- JSON Mapper Configuration Files
- Controls/specifies the mapping between columns from the NMDC Submission Portal and column names used in the output spreadsheets
- Top-level keys indicate main headers in the output
- Numbered keys add clarifying header information
- The
headerkeyword allows custom column names - The
sub_port_mappingkeyword specifies mappings between Submission Portal columns/slots (as dictated by the NMDC submission schema) and user facility template columns - Examples available in input-files/
muttsCLI
- Command-line application that performs the metadata conversion
- Consumes mapper files and submission data as inputs
- Clone this repository
git clone https://github.com/microbiomedata/metadata-for-user-facility-template-transformations.git
cd metadata-for-user-facility-template-transformations- Install dependencies with Poetry
poetry installThis installs the mutts package in development mode and creates the mutts command-line tool.
- Set up your
.envfile
cp .env.example .env # if available, or create a new .env fileAdd your NMDC API token and submission portal base URL:
DATA_PORTAL_REFRESH_TOKEN=your_token_here
SUBMISSION_PORTAL_BASE_URL=https://data.microbiomedata.org
Get your token from: https://data.microbiomedata.org/user
- Run the CLI in development mode
poetry run mutts --helpTo create a custom mapper for a new user facility, refer to the existing examples:
- emsl_header.json - EMSL configuration
- jgi_mg_header.json - JGI Metagenome configuration
- jgi_mt_header.json - JGI Metatranscriptome configuration
- jgi_mg_header_v15.json - JGI Metagenome v15 configuration
- jgi_mt_header_v15.json - JGI Metatranscriptome v15 configuration