knitr::opts_chunk$set(eval=FALSE)

library(binneR)
library(ggplot2)
htmltools::img(src = knitr::image_uri(system.file('sticker/binneRsticker.png',package = 'binneR')),
                             style = 'display:block;margin-left:auto;margin-right:auto; height:200px;') 

Introduction

The binneR package provides a spectral binning approach for routine processing of flow infusion electrospray - high resolution mass spectrometry (FIE-HRMS) metabolomic fingerprinting experiments, the results of which can then be used for subsequent statistical analyses.

Spectral binning rounds high resolution fingerprinting data by a specified amu bin width. FIE-HRMS data consists of a 'plug flow', across which MS signal intensities can be averaged to provide a metabolome fingerprint. Below shows an animation of the spectrum change across ‘plug flow’ region of an example FIE-HRMS injection acquired in negative ionisation mode.

file <- system.file('example-data/1.mzML.gz',package = 'binneR')
ms <- mzR::openMSfile(file)
chrom <- mzR::header(ms)

is <- detectInfusionScans(file)
buffer <- 10
is <- c(1,is[2] + buffer)

spectrum <- mzR::peaks(ms) %>%
    purrr::map(~{
        tibble::as_tibble(.x,.name_repair = 'minimal') %>%
            magrittr::set_colnames(c('m/z','Abundance'))
    }) %>%
    dplyr::bind_rows(.id = 'seqNum') %>%
    dplyr::mutate(seqNum = as.numeric(seqNum)) %>%
    dplyr::left_join(chrom %>%
                            dplyr::select(seqNum,polarity),by = 'seqNum') %>%
    dplyr::filter(polarity == 0) %>%
    dplyr::left_join(tibble::tibble(seqNum = unique(.$seqNum),Scan = 1:length(unique(.$seqNum))),by = 'seqNum') %>%
    dplyr::filter(Scan >= is[1] & Scan <= is[2]) %>%
    dplyr::mutate(Bin = `m/z` %>% round(2)) %>%
    dplyr::group_by(Scan,Bin) %>%
    dplyr::summarise(Abundance = sum(Abundance),.groups = 'drop')

lim <- max(spectrum$Abundance) + 1000

chrom <- chrom %>%
    dplyr::filter(polarity == 0)
chrom$acquisitionNum <- 1:nrow(chrom)

for (i in is[1]:is[2]) {
    p <- list()

    p$chromatogram <- ggplot(chrom,aes(x = acquisitionNum,y = totIonCurrent)) + 
        geom_line() +
        geom_vline(xintercept = i, linetype = "dashed",colour = 'red') +
        theme_bw(base_size = 10) +
        xlab('Scan Number') +
        ylab('Total Ion Count') +
        ggtitle('Chromatogram')

    p$spectrum <- spectrum %>%
        dplyr::filter(Scan == i) %>%
        {
            ggplot(.,aes(x = Bin, y = 0, xend = Bin, yend = Abundance)) +
                geom_segment() +
                theme_bw() +
                labs(title = 'Spectrum',
                         x = 'm/z',
                         y = 'Abundance') +
                ylim(0,lim) +
                xlim(0,1000)
        }

    gridExtra::grid.arrange(gridExtra::arrangeGrob(p$chromatogram,p$spectrum))
}

Spectral binning is applied on a scan by scan basis where the data is rounded to the specified bin width, the signals are then sum aggregated and their intensities are averaged across the specified scans.

Prior to the use of binneR, vendor specific raw data files need to be converted to one of the open source file formats such as .mzXML or .mzML so that they can be parsed into R. Data should also be centroided to reduce bin splitting artefacts that profile data can introduce during spectral binning. The msconvert tool can be used for both data conversion and centroiding, allowing the use of vendor specific algorithms.

There are two main functionalities provided by this package.

The subsequent sections will outline the use of these two main functionalities.

Before we begin, the necessary packages need to be loaded.

library(binneR)
library(metaboData)

Parallel Processing

The package supports parallel processing using the future package.

By default processing by binneR will be done sequentially. However, parallel processing can be activated prior to processing by specifying a parallel back-end using plan(). The following example specifies using the multisession back-end (multiple background R sessions) with two worker processes.

plan(future::multisession,workers = 2)

See the future package documentation for more information on the types of parallel back-ends that are available.

Infusion Scan Detection

In order to apply the spectral binning approach for FIE-HRMS data, the infusion scans need to be detected. For a set of specified file paths, the range of infusion scans can be detected using the following:

infusionScans <- detectInfusionScans(
    metaboData::filePaths('FIE-HRMS','BdistachyonTechnical')[1],
    thresh = 0.5
)

The detected scans can then be checked by plotting an averaged chromatogram for these files. The infusion scans can also be plotted by supplying the range to the scans argument.

plotChromFromFile(
    metaboData::filePaths('FIE-HRMS','BdistachyonTechnical')[1],
    scans = infusionScans
)

Simple Intensity Matrix Production - quick FIE-HRMS matrix investigations

The simplest funtionality of binneR is to read raw data vector of specified file paths, bin these to a given amu and aggregate across a given scan window. This can be useful for a quick assessment of FIE-HRMS data structures. Spectral binning can be performed using the readFiles() function as shown below. The example file within the package can be specified using the following.

file <- metaboData::filePaths('FIE-HRMS','BdistachyonTechnical')[1]

Then the data can be spectrally binned using:

res <- readFiles(file,dp = 2,scans = infusionScans)

This will return a list containing the intensity matrices for each ionisation mode, with the rows being the individual samples and columns the spectral bins.

binneRlyse - metabolomics fingerprinting experiments

Routine FIE-HRMS metabolomic fingerprinting experiments can require rapid the processing of hundereds of MS files that will also require sample information such as biological classes for subsequent statistical analyses. The package allows for a Binalysis that formalises the spectral binning approach using an S4 class that not only bins the data to 0.01 amu but will also extract accurate m/z for each of these bins based on 0.00001 amu binned data. The accurate m/z data can be aggregated based on a specified class structure from which the modal accurate m/z is extracted. Some bin metrics are also computed that allow the assessment of the quality of the 0.01 amu bins.

The example data used here is from the metaboData package and consists of 10 replicate injections of a leaf tissue extract from the model grass species Brachypodium distachyon.

Basic Usage

Firstly the file paths and sample information can be loaded for the example data set using the following:

info <- metaboData::runinfo('FIE-HRMS','BdistachyonTechnical')

files <-  metaboData::filePaths('FIE-HRMS','BdistachyonTechnical')

There are two main functions for processing experimental data:

Sample information

binneRlyse() requires the provision of sample information (info) for the experimental run to be processed. This should be in csv format and the recommended column headers include:

The row orders of the info file should match the order in which the files paths are submitted to the binneRlyse() processing function.

Parameters

Prior to spectral binning the processing parameters first need to be selected. The binning parameter can be detected using the detectParameters() function as shown below.

parameters <- detectParameters(files)

These parameters specify the following:

Alternatively, parameters can be initialised using the binParameters function as shown below.

parameters <- binParameters(scans = 6:14)

For and already initialised BinParameters object, parameters can be changed using the methods named after the parameter of interest. For example to change the scans of a given object:

alternative_parameters <- parameters
scans(alternative_parameters) <- 6:14

Processing

Processing is simple and requires only the use of the binneRlyse() function. The input of this function is a vector of the paths of the data files to process, a tibble::tibble containing the sample info and a BinParameters object.

It is crucial that the positions of the sample information in the info file match the sample positions within the files vector. Below shows an example of how this can be checked by matching the file names present in the info with those in the vector.

TRUE %in% (info$fileName != basename(files))

Spectral binning can then be performed with the following.

analysis <- binneRlyse(files,info,parameters)

For data quality inspection, the infusion profiles this data can be plotted using:

plotChromatogram(analysis)

The spectrum fingerprints using:

plotFingerprint(analysis)

And the total ion counts using:

plotTIC(analysis)

Density profiles for individual bins can be plotted by:

plotBin(analysis,'n133.01',type = 'cls')

Data Extraction

There are a number of functions that can be used to return processing data from a Binalysis object:

Bin Metrics

There are a number of metrics that can be computed that allow the assessment of the quality of a given 0.01 amu bin in terms of the accurate m/z mzR::peaks present within its boundaries. These include both purity and centrality.

dat <- readFiles(file,scans = detectInfusionScans(file),dp = 5) %>%
    dplyr::bind_cols() %>%
    tidyr::gather('mz','Intensity') %>%
    dplyr::mutate(mode = stringr::str_sub(mz,1,1),
                 mz = as.numeric(stringr::str_replace(mz,'[:alpha:]','')),
                 bin = round(mz,2))
measures <- dat %>%
    dplyr::group_by(bin,mode) %>%
    dplyr::summarise(purity = binneR:::binPurity(mz,Intensity),
                        centrality = binneR:::binCentrality(mz,Intensity),
                        Intensity = mean(Intensity),.groups = 'drop')

Purity

Bin purity gives a metric of the spread of accurate m/z mzR::peaks found within a given bin and can be a signal for the presences of multiple real spectral mzR::peaks within a bin.

The purity metric is a value between 0 and 1 with a purity closer to 1 indicating that the accurate m/z present within a bin are found over a narrow region and therefore likely only to be as the result of 1 real mass spectral peak. A reduction in purity could indicate the presence of multiple mzR::peaks present within a bin.

Below shows example density plots of two negative ionisation mode 0.01 amu bins showing high (n133.01) and low (n146.97) purity respectively.

Pure <- measures %>%
    dplyr::filter(bin == 133.01,mode == 'n') %>%
    dplyr::mutate(purity = paste('Purity = ',signif(purity,3), sep = ''))
pure <- dat %>%
    dplyr::filter(bin == Pure$bin, mode == Pure$mode[1]) %>%
    split(1:nrow(.)) %>%
    purrr::map(~{
        tibble::tibble(mz = rep(.x$mz,.x$Intensity))
    }) %>%
    dplyr::bind_rows()

Impure <- measures %>%
    dplyr::filter(bin == 405.11,mode == 'n') %>%
    dplyr::mutate(purity = paste('Purity = ',signif(purity,3), sep = ''))
impure <- dat %>%
    dplyr::filter(bin == Impure$bin, mode == Impure$mode) %>%
    split(1:nrow(.)) %>%
    purrr::map(~{
        tibble::tibble(mz = rep(.x$mz,.x$Intensity))
    }) %>%
    dplyr::bind_rows()

p <- list()

p$pure <- ggplot(pure,aes(x = mz)) +
    geom_density() +
    theme_bw() +
    xlim(Pure$bin - 0.005,Pure$bin + 0.005) +
    ggtitle(paste(Pure$mode,Pure$bin,'   ',Pure$purity,sep = '')) +
    xlab('m/z') +
    ylab('Density')

p$impure <- ggplot(impure,aes(x = mz)) +
    geom_density() +
    theme_bw() +
    xlim(Impure$bin - 0.005,Impure$bin + 0.005) +
    ggtitle(paste(Impure$mode,Impure$bin,'   ',Impure$purity,sep = '')) +
    xlab('m/z') +
    ylab('Density')

gridExtra::grid.arrange(gridExtra::arrangeGrob(p$pure,p$impure))

Bin n133.01, that has a purity very close to 1, has only one peak present. Bin n405.11, that has a reduced purity, clearly has two peaks present.

Centrality

Bin centrality gives a metric of how close the mean of the accurate m/z are to the centre of a given bin and can give indication of whether a peak could have been split between the boundary of tow adjacent bins.

The centrality metric is a value between 0 and 1 with a centrality close to 1 indicating that the accurate m/z present within the boundaries of the bin are located close to the centre of the bin. Low centrality would indicate that the accurate m/z present within the bin are found close to the bin boundary and could therefore indicate bin splitting, were an mass spectral peak is split between two adjacent bins.

Below shows example density plots of two negative ionisation mode 0.01 amu bins showing high (n88.04) and low (n104.03) centrality respectively.

Pure <- measures %>%
    dplyr::filter(bin == 88.04,mode == 'n') %>%
    dplyr::mutate(centrality = paste('Centrality = ',signif(centrality,3), sep = ''))
pure <- dat %>%
    dplyr::filter(bin == Pure$bin, mode == Pure$mode[1]) %>%
    split(1:nrow(.)) %>%
    purrr::map(~{
        tibble::tibble(mz = rep(.x$mz,.x$Intensity))
    }) %>%
    dplyr::bind_rows()

Impure <- measures %>%
    dplyr::filter(bin == 104.03,mode == 'n') %>%
    dplyr::mutate(centrality = paste('Centrality = ',signif(centrality,3), sep = ''))
impure <- dat %>%
    dplyr::filter(bin == Impure$bin, mode == Impure$mode) %>%
    split(1:nrow(.)) %>%
    purrr::map(~{
        tibble::tibble(mz = rep(.x$mz,.x$Intensity))
    }) %>%
    dplyr::bind_rows()

p <- list()

p$pure <- ggplot(pure,aes(x = mz)) +
    geom_density() +
    theme_bw() +
    xlim(Pure$bin - 0.005,Pure$bin + 0.005) +
    ggtitle(paste('n',Pure$bin,'   ',Pure$centrality,sep = '')) +
    xlab('m/z') +
    ylab('Density')

p$impure <- ggplot(impure,aes(x = mz)) +
    geom_density() +
    theme_bw() +
    xlim(Impure$bin - 0.005,Impure$bin + 0.005) +
    ggtitle(paste('n',Impure$bin,'   ',Impure$centrality,sep = '')) +
    xlab('m/z') +
    ylab('Density')

gridExtra::grid.arrange(gridExtra::arrangeGrob(p$pure,p$impure))

Bin n88.04 has a high centrality with single peak that is located very close to the center of the bin. Whereas bin n104.03 as low centrality with a single peak that is located very close to the upper boundary of the bin and could indicate that it has been split between this bin and bin n104.04.



aberHRML/binneR documentation built on Aug. 25, 2023, 1:33 p.m.