This MetabSpace directory contains codes, data, and instructions for using the "chemical characteristics vector" approach for capturing the chemical space of biomes and mapping them using LC-MS/MS data. The preprint for the main article related to MetabSpace is "Chemical characteristics vectors map the chemical space of natural biomes from untargeted mass spectrometry data", https://doi.org/10.1186/s13321-025-01031-2
Data for the article is in Zenodo: (https://doi.org/10.5281/zenodo.14506250)
The following workflow describes the general idea or the whole metabolomics data analysis approach, with Step 3-4 including the "chemical characteristics vector" part.
Step 1 is LC-MS/MS peak extraction which can be done using software like MZmine, MS-DIAL or R-based package patRoon.
Step 2 is molecular fingerprint and compound class prediction using SIRIUS software.
Step 3 is our developed approach to describe the ratio of compounds in the sample with specific chemical moiety.
Step 4 illustrates how with this approach we are now able to compare the chemical space of compounds more efficiently.
CCV_article.R -> Codes for gathering SIRIUS chemical characteristics and calculating averaged CCVs.
CCV_article_figures.R -> Code for creating Figures for the article and re-analyzing the data. Uses data from Zenodo.
Functions_SIRIUS_DataAnalysis.R -> gathers functions to get data from SIRIUS calculations. Getting annotation tables and confidence scores into one table.
Function_IntensityWeightedCCV.R -> Adjusted code from the first work to add intensity weight. This approach take into account the compounds amount in sample. Enables better comparison when same compounds are present in different samples, but their abundance is different.
Most data is available in Zenodo (https://doi.org/10.5281/zenodo.14506250). For additional data, please contact Pilleriin Peets ([email protected], [email protected])
For SIRIUS5 all molecular fingerprints and canopus classes can be written out to folders and gathered using "Functions_SIRIUS_DataAnalysis.R" code.
For SIRIUS6 this is not anymore possible. To get the molecular fingerprints and canopus vectors, these must be extracted from API.
For the article "Chemical characteristics vectors map the chemical space of natural biomes from untargeted mass spectrometry data", metabolomics data from the Earth Microbiome Project was used: (https://earthmicrobiome.org)
Shaffer, J.P., Nothias, LF., Thompson, L.R. et al. Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nat Microbiol 7, 2128–2150 (2022). https://doi.org/10.1038/s41564-022-01266-x
