IPO Benchmarks

The following times were reported in the original IPO paper for these three [data sets] (https://github.com/glibiseller/IPO/wiki/IPO-DataSets) taken from the papers supplement.

Details	Metabolite fingerprinting	Lipidomics	Central carbon metabolism
training set (n)	12	4	6
test set (n)	11	4	6
DoEs peakpicking (n)	4	3	2
DoEs retcor + grouping (n)	5	5	4
time for peakpicking opt	3.8 h	1.5 h	0.9 h
time for retcor + grouping opt	0.8 h	0.7 h	0.6 h
overall time	4.6 h	2.2 h	1.5 h

Original paper computer specs:
Intel(R) Core™ i5 CPU 760 @ 2.80GHz system with 4 GB RAM running Windows 7 32 Bit with R (v3.1.1)

Source (Supplement data sets 1,2,3): BMC Bioinformatics 2015, 16:118 doi:10.1186/s12859-015-0562-8 http://www.biomedcentral.com/1471-2105/16/118

Considerations for parallel processing

Under Windows each newly spawned process or parallel thread will invoke an rscript.exe slave that consumes around 500 Mbytes memory, hence running 32 threads will require (32x0.5 GByte) or 16 Gbyte RAM total. The parallel efficiency for parts of IPO is quite high with 82%, assuming no read/write overhead. However not all processes within IPO are fully parallelized, the grouping and retention time correction can not be calculated in parallel by XCMS at the moment, see the table above.

Three example scripts for benchmarking IPO

Metabolite Fingerprinting Set

### ***********************************************************************
### IPO Metabolite Fingerprint Set 1
### URL: https://github.com/glibiseller/IPO/
### Design of experiment for LC/MS and GC/MS Data Alignement using XCMS
### ***********************************************************************

### ***********************************************************************
### The script assumes IPO and xcms are installed. 
### Copy this file into commandline or open via "Open script" 
### Set 2 variables in +++ section and run have fun
### Example Script: Tobias Kind fiehnlab.ucdavis.edu 2015
### ***********************************************************************

myIPOSet1 <- function () {
    
    ### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    ### These are the 2 variables you have to set for your own datafiles anything else runs automatically
    ### Set your working directory under Windows, where your netCDF files are stored
    ### Important: use "/" not "\" 
    
    ### dataset from https://health.joanneum.at/IPO/MetaboliteFingerprintingTrainingSet.zip
    myDir = "Z:/MetaboliteFingerprint"
    ### use 32 CPUs (or threads or parallel instances)
    mySlaves = 32
    ### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    
    ### change working directory to your files, see +++ section
    setwd(myDir)
    
    ### get working directory
    (WD <- getwd())

    ### load the IPO package
    library(IPO)

    ### run IPO 
    ppParams <- getDefaultXcmsSetStartingParams()
    ppResult <- optimizeXcmsSet(params=ppParams, nSlaves=mySlaves)
    rgResult <- optimizeRetGroup(xset=ppResult$best_settings$xset, nSlaves=mySlaves, subdir=myDir)
    ppResult$best_settings$parameters
    rgResult$best_settings
    
    ### create a script which you can use to process your raw data (uncomment once needed)
    ### writeRScript(resultPeakpicking$best_settings$parameters, resultRetcorGroup$best_settings, nSlaves=4)
    
    ### output were done!
    print("Finished. Thank you for using IPO.")
}

### gives CPU, system, TOTAL time in seconds 
system.time(myIPOSet1())

### user     system  elapsed  (CPU=12)
### 2789.74  1275.15 17058.63 (4h:44min) 
### ***********************************************************************
### function finished
### ***********************************************************************

Lipidomics Set

### ***********************************************************************
### IPO Lipidomics Set
### URL: https://github.com/glibiseller/IPO/
### Design of experiment for LC/MS and GC/MS Data Alignement using XCMS
### 
### ***********************************************************************

### ***********************************************************************
### The script assumes IPO and xcms are installed. 
### Copy this file into commandline or open via "Open script" 
### Set 2 variables in +++ section and run have fun
### Example Script: Tobias Kind fiehnlab.ucdavis.edu 2015
### ***********************************************************************

myIPOSet2 <- function () {
    
    ### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    ### These are the 2 variables myDir and mySlaves you have to set 
    ### for your own datafiles anything else runs automatically
    ### Set your working directory under Windows, where your mzXML files are stored
    ### Important: use "/" not "\" 
    
    ### download dataset and unpack dataset into myDir
    ### from (https://health.joanneum.at/IPO/LipidomicsTrainingSet.zip)
    myDir = "Z:/Lipidomics"
    ### use 32 CPUs (or threads or parallel instances)
    mySlaves = 32
    ### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    
    ### change working directory to your files, see +++ section
    setwd(myDir)
    
    ### get working directory
    (WD <- getwd())

    ### load the IPO package
    library(IPO)

    ### run IPO 
    ### getDefaultXcmsSetStartingParams("centWave") or "matchedFilter"
    ppParams <- getDefaultXcmsSetStartingParams()
    ppResult <- optimizeXcmsSet(params=ppParams, nSlaves=mySlaves)
    rgResult <- optimizeRetGroup(xset=ppResult$best_settings$xset, nSlaves=mySlaves, subdir=myDir)
    ppResult$best_settings$parameters
    rgResult$best_settings
    
    ### create a script which you can use to process your raw data 
    ###  writeRScript(ppResult$best_settings$parameters, rgResult$best_settings, nSlaves=1)
    
    ### output were done!
    print("Finished. Thank you for using IPO.")
}

### gives CPU, system, TOTAL time in seconds
system.time(myIPOSet2())

 
### [1] "Finished. Thank you for using IPO."
###    user   system  elapsed 
### 4577.88   2253.56 16310.44 
### 1h:16 min 37 min  4h:32 min

### ***********************************************************************
### function finished
### ***********************************************************************

Central Carbon Metabolism Set

### ***********************************************************************
### IPO Central Metabolism Set
### URL: https://github.com/glibiseller/IPO/
### Design of experiment for LC/MS and GC/MS Data Alignement using XCMS
### ***********************************************************************

### ***********************************************************************
### The script assumes IPO and xcms are installed. 
### Copy this file into commandline or open via "Open script" 
### Set 2 variables in +++ section and run have fun
### Example Script: Tobias Kind fiehnlab.ucdavis.edu 2015
### ***********************************************************************

myIPOSet3 <- function () {
    
    ### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    ### These are the 2 variables you have to set for your own datafiles anything else runs automatically
    ### Set your working directory under Windows, where your netCDF files are stored
    ### Important: use "/" not "\" 
    
    ### dataset from https://health.joanneum.at/IPO/CentralCarbonMetabolismTrainingSet.zip
    myDir = "Z:/CentralMetabolism"
    ### use 32 CPUs (or threads or parallel instances)
    mySlaves = 32
    ### +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    
    ### change working directory to your files, see +++ section
    setwd(myDir)
    
    ### get working directory
    (WD <- getwd())

    ### load the IPO package
    library(IPO)

    ### run IPO 
    ppParams <- getDefaultXcmsSetStartingParams()
    ppResult <- optimizeXcmsSet(params=ppParams, nSlaves=mySlaves)
    rgResult <- optimizeRetGroup(xset=ppResult$best_settings$xset, nSlaves=mySlaves, subdir=myDir)
    ppResult$best_settings$parameters
    rgResult$best_settings
    
    ### create a script which you can use to process your raw data (uncomment once needed)
    ### writeRScript(resultPeakpicking$best_settings$parameters, resultRetcorGroup$best_settings, nSlaves=4)
    
    ### output were done!
    print("Finished. Thank you for using IPO.")
}

### gives CPU, system, TOTAL time in seconds 
system.time(myIPOSet3())

### user     system  elapsed  (CPU=32)
### 1078.82  723.07  4344.89  (1h:12min)
 
### ***********************************************************************
### function finished
### ***********************************************************************

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IPO Benchmarks

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally