Real Time Annotation
In peakPantheR: Peak Picking and Annotation of High Resolution Experiments

BiocStyle::markdown()

knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Package: r Biocpkg("peakPantheR")
Authors: Arnaud Wolfer

## Silently loading all packages
library(BiocStyle)
library(peakPantheR)
library(faahKO)
library(pander)

Introduction

The peakPantheR package is designed for the detection, integration and reporting of pre-defined features in MS files (e.g. compounds, fragments, adducts, ...).

The Real Time Annotation is set to detect and integrate multiple compounds in one file at a time. It therefore can be deployed on a LC-MS instrument to integrate a set of pre-defined features (e.g. spiked standards) as soon as the acquisition of a sample is completed.

Using the r Biocpkg("faahKO") raw MS dataset as an example, this vignette will:

Detail the Real Time Annotation concept
Apply the Real Time Annotation to a subset of pre-defined features in the r Biocpkg("faahKO") dataset

Abbreviations

ROI: Regions Of Interest
- reference RT / m/z windows in which to search for a feature
uROI: updated Regions Of Interest
- modifed ROI adapted to the current dataset which override the reference ROI
FIR: Fallback Integration Regions
- RT / m/z window to integrate if no peak is found
TIC: Total Ion Chromatogram
- the intensities summed across all masses for each scan
EIC: Extracted Ion Chromatogram
- the intensities summed over a mass range, for each scan

Real Time Annotation Concept

Real time compound integration is set to process multiple compounds in one file at a time.

To achieve this, peakPantheR will:

load a list of expected RT / m/z regions of interest (ROI)
detect features in each ROI and keep the highest intensity one
determine peak statistics for each feature
return:
- TIC
- a table with all detected compounds for that file (row: compound, col: statistic)
- EIC for each ROI
- sample acquisition date-time from the mzML metadata (if available)
- save EIC plots to disk

Real Time Annotation Example

In the following example we will target two pre-defined features in a single raw MS spectra file from the r Biocpkg("faahKO") package. For more details on the installation and input data employed, please consult the Getting Started with peakPantheR vignette.

Input Data

The path to a MS file from the r Biocpkg("faahKO") is located and used as input spectra:

library(faahKO)
## file paths
input_spectraPath  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"))
input_spectraPath

Two targeted features (e.g. compounds, fragments, adducts, ...) are defined and stored in a table with as columns:

cpdID (numeric)
cpdName (character)
rtMin (sec)
rtMax (sec)
rt (sec, optional / NA)
mzMin (m/z)
mzMax (m/z)
mz (m/z, optional / NA)

# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), 
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", 
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 
                                522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 
                                496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], 
                                            as.numeric)

# use pandoc for improved readability
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), 
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", 
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 
                                522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 
                                496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], 
                                            as.numeric)
rownames(input_targetFeatTable) <- NULL
pander::pandoc.table(input_targetFeatTable, digits = 9)

Run Single File Annotation

peakPantheR_singleFileSearch() takes as input a singleSpectraDataPath pointing to the file to process and targetFeatTable defining the features to integrate. The resulting annotation contains all the fitting and integration properties:

library(peakPantheR)
annotation <- peakPantheR_singleFileSearch(
                                    singleSpectraDataPath = input_spectraPath,
                                    targetFeatTable = input_targetFeatTable,
                                    peakStatistic = TRUE,
                                    curveModel = 'skewedGaussian',
                                    verbose = TRUE)

annotation$TIC

## acquisition time cannot be extracted from NetCDF files
annotation$acquTime

annotation$peakTable

# use pandoc for improved readability
pander::pandoc.table(annotation$peakTable, digits = 7)

annotation$curveFit

annotation$ROIsDataPoint

peakPantheR_singleFileSearch() takes multiple parameters that can alter the file annotation:

peakStatistic if TRUE calculates additional peak statistics: 'ppm_error', 'rt_dev_sec', 'tailing factor' and 'asymmetry factor'
plotEICsPath if not NA will save a .png of all ROI EICs at the path provided (expects 'filepath/filename.png' for example). If NA no plot is saved
getAcquTime if TRUE the sample acquisition date-time is extracted from the mzML metadata. Acquisition time cannot be extracted from other file formats. The additional file access will impact run time
FIR if not NULL, defines the Fallback Integration Regions (FIR) to integrate when a feature is not found.
curveModel, defines the peak-shape model to fit to each EIC. By default, a 'skewedGaussian' model is used. The other alternative is the exponentially modified gaussian 'emgGaussian' model.
verbose if TRUE messages calculation progress, time taken and number of features found (total and matched to targets)
... passes arguments to findTargetFeatures to alter peak-picking parameters (e.g. the curveModel, the sampling or fitting parameters)

The summary plot generated by plotEICsPath, corresponding to the EICs of each integrated regions of interest is as follow:

knitr::include_graphics("../man/figures/singleFileSearch_EICsPlot.png")

EICs plot: Each panel correspond to a targeted feature, with the EIC extracted on the mzMin, mzMax range found. The red dot marks the RT peak apex, and the red line highlights the RT peakwidth range found (rtMin, rtMax)

peakPantheR
Peak Picking and Annotation of High Resolution Experiments

Real Time Annotation
In peakPantheR: Peak Picking and Annotation of High Resolution Experiments

Introduction

Abbreviations

Real Time Annotation Concept

Real Time Annotation Example

Input Data

Run Single File Annotation

See Also

Try the peakPantheR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

peakPantheR Peak Picking and Annotation of High Resolution Experiments

Real Time Annotation In peakPantheR: Peak Picking and Annotation of High Resolution Experiments

Introduction

Abbreviations

Real Time Annotation Concept

Real Time Annotation Example

Input Data

Run Single File Annotation

See Also

Try the peakPantheR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

peakPantheR
Peak Picking and Annotation of High Resolution Experiments

Real Time Annotation
In peakPantheR: Peak Picking and Annotation of High Resolution Experiments