-
Notifications
You must be signed in to change notification settings - Fork 20
IPO Benchmarks
The following times were reported in the original IPO paper for these three [data sets] (https://github.com/glibiseller/IPO/wiki/IPO-DataSets) taken from the papers supplement.
| Details | Metabolite fingerprinting | Lipidomics | Central carbon metabolism |
|---|---|---|---|
| training set (n) | 12 | 4 | 6 |
| test set (n) | 11 | 4 | 6 |
| DoEs peakpicking (n) | 4 | 3 | 2 |
| DoEs retcor + grouping (n) | 5 | 5 | 4 |
| time for peakpicking opt | 3.8 h | 1.5 h | 0.9 h |
| time for retcor + grouping opt | 0.8 h | 0.7 h | 0.6 h |
| overall time | 4.6 h | 2.2 h | 1.5 h |
Original paper computer specs:
Intel(R) Core™ i5 CPU 760 @ 2.80GHz system with 4 GB RAM running Windows 7 32 Bit with R (v3.1.1)
Source (Supplement data sets 1,2,3): BMC Bioinformatics 2015, 16:118 doi:10.1186/s12859-015-0562-8 http://www.biomedcentral.com/1471-2105/16/118
Considerations for parallel processing
Under Windows each newly spawned process or parallel thread will invoke an rscript.exe slave that consumes around 500 Mbytes memory, hence running 32 threads will require (32x0.5 GByte) or 16 Gbyte RAM total. The parallel efficiency for parts of IPO is quite high with 82%, assuming no read/write overhead. However not all processes within IPO are fully parallelized, the grouping and retention time correction can not be calculated in parallel by XCMS at the moment, see the table above.
Three example scripts for benchmarking IPO
Metabolite Fingerprinting Set
### ***********************************************************************
### IPO Metabolite Fingerprint Set 1
### URL: https://github.com/glibiseller/IPO/
### Design of experiment for LC/MS and GC/MS Data Alignement using XCMS
### ***********************************************************************
### ***********************************************************************
### The script assumes IPO and xcms are installed.
### Copy this file into commandline or open via "Open script"
### Set 2 variables in +++ section and run have fun
### Example Script: Tobias Kind fiehnlab.ucdavis.edu 2015
### ***********************************************************************
myIPOSet1 <- function () {
### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### These are the 2 variables you have to set for your own datafiles anything else runs automatically
### Set your working directory under Windows, where your netCDF files are stored
### Important: use "/" not "\"
### dataset from https://health.joanneum.at/IPO/MetaboliteFingerprintingTrainingSet.zip
myDir = "Z:/MetaboliteFingerprint"
### use 32 CPUs (or threads or parallel instances)
mySlaves = 32
### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### change working directory to your files, see +++ section
setwd(myDir)
### get working directory
(WD <- getwd())
### load the IPO package
library(IPO)
### run IPO
ppParams <- getDefaultXcmsSetStartingParams()
ppResult <- optimizeXcmsSet(params=ppParams, nSlaves=mySlaves)
rgResult <- optimizeRetGroup(xset=ppResult$best_settings$xset, nSlaves=mySlaves, subdir=myDir)
ppResult$best_settings$parameters
rgResult$best_settings
### create a script which you can use to process your raw data (uncomment once needed)
### writeRScript(resultPeakpicking$best_settings$parameters, resultRetcorGroup$best_settings, nSlaves=4)
### output were done!
print("Finished. Thank you for using IPO.")
}
### gives CPU, system, TOTAL time in seconds
system.time(myIPOSet1())
### user system elapsed (CPU=12)
### 2789.74 1275.15 17058.63 (4h:44min)
### ***********************************************************************
### function finished
### ***********************************************************************
Lipidomics Set
### ***********************************************************************
### IPO Lipidomics Set
### URL: https://github.com/glibiseller/IPO/
### Design of experiment for LC/MS and GC/MS Data Alignement using XCMS
###
### ***********************************************************************
### ***********************************************************************
### The script assumes IPO and xcms are installed.
### Copy this file into commandline or open via "Open script"
### Set 2 variables in +++ section and run have fun
### Example Script: Tobias Kind fiehnlab.ucdavis.edu 2015
### ***********************************************************************
myIPOSet2 <- function () {
### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### These are the 2 variables myDir and mySlaves you have to set
### for your own datafiles anything else runs automatically
### Set your working directory under Windows, where your mzXML files are stored
### Important: use "/" not "\"
### download dataset and unpack dataset into myDir
### from (https://health.joanneum.at/IPO/LipidomicsTrainingSet.zip)
myDir = "Z:/Lipidomics"
### use 32 CPUs (or threads or parallel instances)
mySlaves = 32
### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### change working directory to your files, see +++ section
setwd(myDir)
### get working directory
(WD <- getwd())
### load the IPO package
library(IPO)
### run IPO
### getDefaultXcmsSetStartingParams("centWave") or "matchedFilter"
ppParams <- getDefaultXcmsSetStartingParams()
ppResult <- optimizeXcmsSet(params=ppParams, nSlaves=mySlaves)
rgResult <- optimizeRetGroup(xset=ppResult$best_settings$xset, nSlaves=mySlaves, subdir=myDir)
ppResult$best_settings$parameters
rgResult$best_settings
### create a script which you can use to process your raw data
### writeRScript(ppResult$best_settings$parameters, rgResult$best_settings, nSlaves=1)
### output were done!
print("Finished. Thank you for using IPO.")
}
### gives CPU, system, TOTAL time in seconds
system.time(myIPOSet2())
### [1] "Finished. Thank you for using IPO."
### user system elapsed
### 4577.88 2253.56 16310.44
### 1h:16 min 37 min 4h:32 min
### ***********************************************************************
### function finished
### ***********************************************************************
Central Carbon Metabolism Set
### ***********************************************************************
### IPO Central Metabolism Set
### URL: https://github.com/glibiseller/IPO/
### Design of experiment for LC/MS and GC/MS Data Alignement using XCMS
### ***********************************************************************
### ***********************************************************************
### The script assumes IPO and xcms are installed.
### Copy this file into commandline or open via "Open script"
### Set 2 variables in +++ section and run have fun
### Example Script: Tobias Kind fiehnlab.ucdavis.edu 2015
### ***********************************************************************
myIPOSet3 <- function () {
### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### These are the 2 variables you have to set for your own datafiles anything else runs automatically
### Set your working directory under Windows, where your netCDF files are stored
### Important: use "/" not "\"
### dataset from https://health.joanneum.at/IPO/CentralCarbonMetabolismTrainingSet.zip
myDir = "Z:/CentralMetabolism"
### use 32 CPUs (or threads or parallel instances)
mySlaves = 32
### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### change working directory to your files, see +++ section
setwd(myDir)
### get working directory
(WD <- getwd())
### load the IPO package
library(IPO)
### run IPO
ppParams <- getDefaultXcmsSetStartingParams()
ppResult <- optimizeXcmsSet(params=ppParams, nSlaves=mySlaves)
rgResult <- optimizeRetGroup(xset=ppResult$best_settings$xset, nSlaves=mySlaves, subdir=myDir)
ppResult$best_settings$parameters
rgResult$best_settings
### create a script which you can use to process your raw data (uncomment once needed)
### writeRScript(resultPeakpicking$best_settings$parameters, resultRetcorGroup$best_settings, nSlaves=4)
### output were done!
print("Finished. Thank you for using IPO.")
}
### gives CPU, system, TOTAL time in seconds
system.time(myIPOSet3())
### user system elapsed (CPU=32)
### 1078.82 723.07 4344.89 (1h:12min)
### ***********************************************************************
### function finished
### ***********************************************************************