BiocStyle::markdown()
Package: r Biocpkg("OptiLCMS")
Authors: Zhiqiang Pang
Modified: r file.info("OptiLCMS_MS1.Rmd")$mtime
Compiled: r date()
## Silently loading all packages library(BiocStyle) library(OptiLCMS) library(mtbls2)
Global metabolomics aims to comprehensively study metabolic profiles of various biological systems. Despite high-resolution mass spectrometry (MS) has been implemented extensively, high-quality data processing remains challenging. OptiLCMS is an R package derived from r Biocpkg("xcms")
and r Biocpkg("CAMERA")
. It is providing a series of functions to do the peak profiling (including peak picking, alignment and gap filling) and annotation. This R package is also the Core of the LC-MS Spectral Processing Module in MetaboAnalyst.
This vignette tutorial is providing a hand-to-hand guidance for users to run raw spectral analysis. Now, let's start from installation.
At first, you need to confirm that you have R (4.0 and later) and Bioconductor installed, and then run the following command:
BiocManager::install("OptiLCMS")
The development version of this package can be install directly from github of Githubpkg("xia-lab/OptiLCMS")
by using the following command:
# Latest features could only be obtained from # this approach, make sure 'devtools' installed first devtools::install_github("xia-lab/OptiLCMS", build = TRUE, build_vignettes = TRUE, build_manual =TRUE)
The latest function could be available from this way.
You could download the tar package from this link and install it by using the following command:
# Remember to replace the PATH_TO_TAR as the # right path of your downloaded package (OptiLCMS_0.99.x.tar.gz). install.packages(PATH_TO_TAR, repos = NULL, type="source")
Parameters' setting is quite critical to get optimal results for centWave algorithm. However, the optimization of parameters maybe very hard for fresh users. Here, we provide an automated pipeline to optimize the paramters for centWave. This step is used to extract ROIs of the data from m/z and RT dimensions, respectively. The extracted ROIs are considered as the representative regions of the whole spectrum. The extraction details has been described in MetaboAnalystR 3.0 paper.
# ROI extraction for optimization # 1. Define a vector of files' paths for optimization DataFiles <- dir(system.file("mzData", package = "mtbls2"), full.names = TRUE, recursive = TRUE) # 2. Extract Regions of Interests (ROI) # rt.idx is extraction percentage (3.5% here, default is 1/15) from RT dimension # rmConts is used to define whether to remove potential contaminants mSet <- PerformROIExtraction(datapath = DataFiles[10:11], rt.idx = 0.035, rmConts = FALSE);
Then the parameters for following steps will be optimzied automatically here by simply using function PerformParamsOptimization
. Users only need to define the parallel of tasks (e.g. ncore = 2) and (optionally) the initial parameters.
## DoE optimization for optimal parameters' combination # This step is used to optimize the parameters for # following steps with DoE model, please choose ncore > 1 for your actual practice best_params <- PerformParamsOptimization(mSet = mSet, param = SetPeakParam(), ncore = 1);
Initialize an Object (Optional)
Before we start the whole pipeline, we should initialize an empty mSet object as the running target for all following steps. This step is optional, users could also get an mSet object immediately the data import finished with the following steps.
library(OptiLCMS) mSet<-InitDataObjects("spec", "raw", FALSE)
Then please set the global parallel cores (optional)
SetGlobalParallel(1); register(bpstop());
Import as onDisk Mode
Here we import MS data as onDisk mode. This mode is more compatible for all computers (Less memory used, but might be slower for next peak profiling steps).
##' Get raw spectra files DataFiles <- dir(system.file("mzData", package = "mtbls2"), full.names = TRUE, recursive = TRUE)[c(10:12, 14:16)] ##' Create a phenodata data.frame pd <- data.frame(sample_name = sub(basename(DataFiles), pattern = ".mzData", replacement = "", fixed = TRUE), sample_group = c(rep("col0", 3), rep("cyp79", 3)), stringsAsFactors = FALSE); ##' Define plotting parameters PlottingParam <- SetPlotParam(Plot = TRUE, labels = TRUE); ##' Import raw spectra mSet <- ImportRawMSData(path = DataFiles, mode = "onDisk", plotSettings = PlottingParam, metadata = pd);
Import as inMemory Mode
MS data can be fully imported into memory by using 'inMemory' mode. Compared to 'onDisk' mode, the raw spectral signal in memory will speed up the whole step, but may consume too much RAM. Please use this option only you have equipped your PC with a large RAM chip (> 16GB for 30 samples).
##' Get raw spectra files DataFiles <- dir(system.file("mzData", package = "mtbls2"), full.names = TRUE, recursive = TRUE)[10:12] ##' Create a phenodata data.frame pd <- data.frame(sample_name = sub(basename(DataFiles), pattern = ".mzData", replacement = "", fixed = TRUE), sample_group = rep("col0",3), stringsAsFactors = FALSE); ##' Import raw spectra mSet0 <- ImportRawMSData(path = DataFiles, mode = "inMemory", plotSettings = SetPlotParam(), metadata = pd);
PerformPeakProfiling
is a wrapped function used to do peak picking, peak alignment and gaps filling in one step automatically. Here we provide two workflows:
4.1 Use PerformPeakProfiling
directly;
4.2 Use PerformPeakPicking
, PerformPeakAlignment
and PerformPeakFilling
;
Here, the peak profiling step is executed with the parameters defined by function SetPeakParam
;
Customized_params <- SetPeakParam(ppm = 5, bw = 10, mzdiff = 0.001, max_peakwidth = 15, min_peakwidth = 10) ##' Perform spectra profiling # User could use the 'Customized_params' or the optimized 'best_params' above mSet <- PerformPeakProfiling(mSet, Params = best_params, ncore = 1, plotSettings = SetPlotParam(Plot = TRUE))
### Everything has been done! All figures will be generated during the process.
Peak Profiling function can also be achieved by using PerformPeakPicking
, PerformPeakAlignment
and PerformPeakFilling
functions, respectively.
##' Perform spectra peak picking # 1. Extract the internal mSet object data(mSet); newPath <- dir(system.file("mzData", package = "mtbls2"), full.names = TRUE, recursive = TRUE)[c(10, 11, 12)] # 2. Update spectra data file path mSet <- updateRawSpectraPath(mSet, newPath);
# 3. Perform the peak picking step mSet <- PerformPeakPicking(mSet);
# 4. Perform the peak picking step mSet <- PerformPeakAlignment(mSet);
# 5. Perform the peak picking step mSet <- PerformPeakFiling(mSet); # 6. Stop the parallel when all processing done register(bpstop());
Mass spectra generated by LC-MS are often complicated with various adducts, isotopologues, dimers and fragments and thus the molecular ion is often not the highest mass MS peak and not easy to identify. As a result, thousands of features can be detected and aligned into a feature table. It is now well accepted that the large number of features, that has been increased by several mentioned cases (adducts, isotopes etc.) could cause an over-estimation on the real number of compounds. Here, we internalized r Biocpkg("CAMERA")
into the pipeline for annotation for now. More advanced and highly-efficient algorithm will be available soon.
# This step is used to define the parameters for annotation annParams <- SetAnnotationParam(polarity = 'positive', mz_abs_add = 0.035); ## Perform peak annotation with newly deinfed annParams mSet <- PerformPeakAnnotation(mSet = mSet, annotaParam = annParams, ncore =1)
Feature Annotation resolves the issue from redundant adducts and isotopes etc., however, matching the feature to their chemical identity will be more attractive for following tandem targeted acquisition. This section will provide a series of function for users to identify the chemical compounds from MS level as several limited candidates with highest plausibility.
# [Development done, under comprehensive evaluation, will be Available soon]
The results generated by the steps above can be easily exported by doing Result Formatting and Exports. These two steps are finished by FormatPeakList
and a series of Export functions.
## Format the PeakList mSet <- FormatPeakList(mSet = mSet, annParams, filtIso =FALSE, filtAdducts = FALSE, missPercent = 1)
## 1. Export Annotated peak table # Please replace the path as your destination folder (absolute) Export.Annotation(mSet, path = tempdir())
# 2. Export peak table for other module analysis with MetaboAnalyst # Please replace the path as your destination folder (absolute) Export.PeakTable(mSet, path = tempdir())
# 3. Export peak summary results # Please replace the path as your destination folder (absolute) Export.PeakSummary(mSet, path = tempdir())
All steps above (including both customized and optimized options) can be used with resumable mechanism of OptiLCMS. Here we take an example of customized pipeline for users to learn and understand it quickly. Users could easily remove optimization steps in the running.plan
function to adapt customization as resumable workflow.
##' Fetch the raw spectra data DataFiles <- dir(system.file("mzData", package = "mtbls2"), full.names = TRUE, recursive = TRUE)[c(10:12, 14:16)] ##' Create a phenodata data.frame pd <- data.frame(sample_name = sub(basename(DataFiles), pattern = ".mzData", replacement = "", fixed = TRUE), sample_group = c(rep("col0", 3), rep("cyp79", 3)), stringsAsFactors = FALSE) ##' Initialize your plan plan <- InitializaPlan("raw_opt") ##' Define your plan plan <- running.plan(plan, mSet <- PerformROIExtraction(datapath = DataFiles[c(1:2)], rt.idx = 0.05, plot = FALSE, rmConts = FALSE, running.controller = rc), param_initial <- SetPeakParam(), best_parameters <- PerformParamsOptimization(mSet = mSet, param_initial, ncore = 1, running.controller = rc), param <- best_parameters, plotSettings1 <- SetPlotParam(Plot=FALSE), plotSettings2 <- SetPlotParam(Plot=FALSE), mSet <- ImportRawMSData(mSet = mSet, path = DataFiles, metadata = pd, plotSettings = plotSettings1, running.controller = rc), mSet <- PerformPeakProfiling(mSet = mSet, Params = param, plotSettings = plotSettings2, ncore = 1, running.controller = rc), annParams <- SetAnnotationParam(polarity = 'negative', mz_abs_add = 0.025), mSet <- PerformPeakAnnotation(mSet = mSet, annotaParam = annParams, ncore =1, running.controller = rc), mSet <- FormatPeakList(mSet = mSet, annParams, filtIso =FALSE, filtAdducts = FALSE, missPercent = 1)); ##' Run it! # result <- ExecutePlan(plan);
Now, let's try to update the running plan and execute it!
##' Re-define your plan with a change on mz_abs_add from 0.025 to 0.035 plan <- running.plan(plan, mSet <- PerformROIExtraction(datapath = DataFiles[c(1:2)], rt.idx = 0.05, plot = FALSE, rmConts = FALSE, running.controller = rc), param_initial <- SetPeakParam(), best_parameters <- PerformParamsOptimization(mSet = mSet, param_initial, ncore = 1, running.controller = rc), param <- best_parameters, plotSettings1 <- SetPlotParam(Plot=FALSE), plotSettings2 <- SetPlotParam(Plot=FALSE), mSet <- ImportRawMSData(mSet = mSet, path = DataFiles, metadata = pd, plotSettings = plotSettings1, running.controller = rc), mSet <- PerformPeakProfiling(mSet = mSet, Params = param, plotSettings = plotSettings2, ncore = 1, running.controller = rc), annParams <- SetAnnotationParam(polarity = 'negative', mz_abs_add = 0.035), mSet <- PerformPeakAnnotation(mSet = mSet, annotaParam = annParams, ncore =1, running.controller = rc), mSet <- FormatPeakList(mSet = mSet, annParams, filtIso =FALSE, filtAdducts = FALSE, missPercent = 1)); ##' Re-run it! Most steps will be resumed from cache and save your time! # result <- ExecutePlan(plan)
Tautenhahn, R.; Bottcher, C.; Neumann, S. Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 2008, 9, 504, doi:10.1186/1471-2105-9-504.
Libiseller, G.; Dvorzak, M.; Kleb, U.; Gander, E.; Eisenberg, T.; Madeo, F.; Neumann, S.; Trausinger, G.; Sinner, F.; Pieber, T.; et al. IPO: a tool for automated optimization of XCMS parameters. BMC Bioinformatics 2015, 16, 118, doi:10.1186/s12859-015-0562-8.
Kuhl, C.; Tautenhahn, R.; Bottcher, C.; Larson, T.R.; Neumann, S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal Chem 2012, 84, 283-289, doi:10.1021/ac202450g.
Smith, C.A.; Want, E.J.; O'Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 2006, 78, 779-787, doi:10.1021/ac051437y.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.