BiocStyle::markdown() knitr::opts_chunk$set(fig.wide = TRUE, fig.retina = 3)
The figure below depicts the idea of the r BiocStyle::Biocpkg("Spectra")
framework.
For a detailed description, read [@Spectra].
knitr::include_graphics("arch.jpg")
suppressMessages( stopifnot(require(Spectra), require(MsBackendRawFileReader), require(tartare), require(BiocParallel)) )
assemblies aka Common Intermediate Language bytecode The download and install
can be done on all platforms using the command:
rawrr::installRawFileReaderDLLs()
if (isFALSE(rawrr::.checkDllInMonoPath())){ rawrr::installRawFileReaderDLLs() } if (isFALSE(file.exists(rawrr:::.rawrrAssembly()))){ rawrr::installRawrrExe() }
# fetch via ExperimentHub library(ExperimentHub) eh <- ExperimentHub::ExperimentHub()
query(eh, c('tartare'))
The RawFileReader libraries require a file extension ending with .raw
.
EH3220 <- normalizePath(eh[["EH3220"]]) (rawfileEH3220 <- paste0(EH3220, ".raw")) if (!file.exists(rawfileEH3220)){ file.link(EH3220, rawfileEH3220) } EH3222 <- normalizePath(eh[["EH3222"]]) (rawfileEH3222 <- paste0(EH3222, ".raw")) if (!file.exists(rawfileEH3222)){ file.link(EH3222, rawfileEH3222) } EH4547 <- normalizePath(eh[["EH4547"]]) (rawfileEH4547 <- paste0(EH4547 , ".raw")) if (!file.exists(rawfileEH4547 )){ file.link(EH4547 , rawfileEH4547 ) }
Call the constructor using Spectra::backendInitialize
, see also [@Spectra].
beRaw <- Spectra::backendInitialize( MsBackendRawFileReader::MsBackendRawFileReader(), files = c(rawfileEH3220, rawfileEH3222, rawfileEH4547))
Call the print method
beRaw
beRaw |> Spectra::spectraVariables()
Here we reproduce the Figure 2 of @rawrr r BiocStyle::Biocpkg("rawrr")
.
The r BiocStyle::Githubpkg("fgcz/MsBackendRawFileReader")
ships with a
filterScan
method using functionality provided by the C# libraries by
Thermo Fisher Scientific @rawfilereader.
(S <- (beRaw |> filterScan("FTMS + c NSI Full ms2 487.2567@hcd27.00 [100.0000-1015.0000]") )[437]) |> plotSpectra() # supposed to be scanIndex 9594 S # add yIonSeries to the plot (yIonSeries <- protViz::fragmentIon("LGGNEQVTR")[[1]]$y[1:8]) names(yIonSeries) <- paste0("y", seq(1, length(yIonSeries))) abline(v = yIonSeries, col='#DDDDDD88', lwd=5) axis(3, yIonSeries, names(yIonSeries))
For demonstration reasons, we extent the MsBackend
class by a filter method.
The filterIons
function returns spectra if and only if all fragment ions,
given as argument, match. We use r BiocStyle::CRANpkg("protViz")``::findNN
binary search
method for determining the nearest mZ peak for each ion.
If the mass error between an ion and an mz value is less than the given mass
tolerance, an ion is considered a hit.
setGeneric("filterIons", function(object, ...) standardGeneric("filterIons")) setMethod("filterIons", "MsBackend", function(object, mZ=numeric(), tol=numeric(), ...) { keep <- lapply(peaksData(object, BPPARAM = bpparam()), FUN=function(x){ NN <- protViz::findNN(mZ, x[, 1]) hit <- (error <- mZ - x[NN, 1]) < tol & x[NN, 2] >= quantile(x[, 2], .9) if (sum(hit) == length(mZ)) TRUE else FALSE }) object[unlist(keep)] })
The lines below implement a simple targeted peptide search engine. The R code
snippet takes as input a MsBackendRawFileReader
object containing
r length(beRaw)
spectra and y-fragment-ion mZ values determined for
LGGNEQVTR++
.
start_time <- Sys.time() X <- beRaw |> MsBackendRawFileReader::filterScan("FTMS + c NSI Full ms2 487.2567@hcd27.00 [100.0000-1015.0000]") |> filterIons(yIonSeries, tol = 0.005) |> Spectra::Spectra() |> Spectra::peaksData() end_time <- Sys.time()
The defined filterIons
method runs on
r length(beRaw |> MsBackendRawFileReader::filterScan("FTMS + c NSI Full ms2 487.2567@hcd27.00 [100.0000-1015.0000]"))
input spectra and returns r length(X)
spectra.
The runtime is shown below.
end_time - start_time
Next, we define and apply a method for graphing LGGNEQVTR
peptide spectrum
matches. Also, the function returns some statistics of the match.
## A helper plot function to visualize a peptide spectrum match for ## the LGGNEQVTR peptide. .plot.LGGNEQVTR <- function(x){ yIonSeries <- protViz::fragmentIon("LGGNEQVTR")[[1]]$y[1:8] names(yIonSeries) <- paste0("y", seq(1, length(yIonSeries))) plot(x, type = 'h', xlim = range(yIonSeries)) abline(v = yIonSeries, col = '#DDDDDD88', lwd=5) axis(3, yIonSeries, names(yIonSeries)) # find nearest mZ value idx <- protViz::findNN(yIonSeries, x[,1]) data.frame( ion = names(yIonSeries), mZ.yIon = yIonSeries, mZ = x[idx, 1], intensity = x[idx, 2] ) }
op <- par(mfrow=c(4, 1), mar=c(4, 4, 4, 1)) XC <- X |> lapply(FUN = .plot.LGGNEQVTR) |> Reduce(f = rbind)
stats::aggregate(mZ ~ ion, data = XC, FUN = base::mean) stats::aggregate(intensity ~ ion, data = XC, FUN = base::max)
We demonstrate the Spectra::combinePeaks
method and aggregate the four spectra
into a single peak matrix. The statistics returned by .plot.LGGNEQVTR()
should
be identical to the aggregation code snippet output above.
X |> Spectra::combinePeaks(ppm=10, intensityFun=base::max) |> .plot.LGGNEQVTR()
Below we demonstrate the interaction with the r Biocpkg('MsBackendMgf')
package while composing a Mascot Generic Format
mgf file which
is compatible with conducting an MS/MS Ions Search using Mascot Server (>=2.7)
@Perkins1999.
if (require(MsBackendMgf)){ ## Map Spectra variables to Mascot Server compatible vocabulary. map <- c(custom = "TITLE", msLevel = "CHARGE", scanIndex = "SCANS", precursorMz = "PEPMASS", rtime = "RTINSECONDS") ## Compose custom TITLE beRaw$custom <- paste0("File: ", beRaw$dataOrigin, " ; SpectrumID: ", S$scanIndex) (mgf <- tempfile(fileext = '.mgf')) (beRaw |> filterScan("FTMS + c NSI Full ms2 487.2567@hcd27.00 [100.0000-1015.0000]") )[437] |> Spectra::Spectra() |> Spectra::selectSpectraVariables(c("rtime", "precursorMz", "precursorCharge", "msLevel", "scanIndex", "custom")) |> MsBackendMgf::export(backend = MsBackendMgf::MsBackendMgf(), file = mgf, map = map) readLines(mgf) |> head(12) readLines(mgf) |> tail() }
To extract all tandem spectra, you can use the code snippets below
S <- Spectra::backendInitialize( MsBackendRawFileReader::MsBackendRawFileReader(), files = c(rawfileEH4547)) |> Spectra() S
S |> MsBackendMgf::export(backend = MsBackendMgf::MsBackendMgf(), file = mgf, map = map)
Next, we generate an mgf file for each scan type. This is helpful, e.g., for optimizing search settings tandem mass spectrometry sequence database search tool as comet @comet2012 or mascot server @Perkins1999.
## Define scanType patterns scanTypePattern <- list( EThcD.lowres = "ITMS.+sa Full ms2.+@etd.+@hcd.+", ETciD.lowres = "ITMS.+sa Full ms2.+@etd.+@cid.+", CID.lowres = "ITMS[^@]+@cid[^@]+$", HCD.lowres = "ITMS[^@]+@hcd[^@]+$", EThcD.highres = "FTMS.+sa Full ms2.+@etd.+@hcd.+", HCD.highres = "FTMS[^@]+@hcd[^@]+$" )
beRaw <- Spectra::backendInitialize( MsBackendRawFileReader::MsBackendRawFileReader(), files = c(rawrr::sampleFilePath()))
beRaw <- Spectra::backendInitialize( MsBackendRawFileReader::MsBackendRawFileReader(), files = rawrr::sampleFilePath()) beRaw$custom <- paste0("File: ", gsub("/srv/www/htdocs/Data2San/", "", beRaw$dataOrigin), " ; SpectrumID: ", beRaw$scanIndex)
.generate_mgf <- function(ext, pattern, dir=tempdir(), ...){ mgf <- file.path(dir, paste0(sub("\\.raw", "", unique(basename(beRaw$dataOrigin))), ".", ext, ".mgf")) idx <- beRaw$scanType |> grepl(patter=pattern) if (sum(idx) == 0) return (NULL) message(paste0("Extracting ", sum(idx), " ", pattern, " scans\n\t to file ", mgf, " ...")) beRaw[which(idx)] |> Spectra::Spectra() |> Spectra::selectSpectraVariables(c("rtime", "precursorMz", "precursorCharge", "msLevel", "scanIndex", "custom")) |> MsBackendMgf::export(backend = MsBackendMgf::MsBackendMgf(), file = mgf, map = map) mgf } #mapply(ext = names(scanTypePattern), # scanTypePattern, # FUN = .generate_mgf) |> # lapply(FUN = function(f){if (file.exists(f)) {readLines(f) |> head()}})
Given the task, we want to filter an MS2 of peak list recorded on an Orbitrap device to be interested only in the top peak within 100 Da mass windows. The following code snippet will demonstrate a solution.
## Define a function that takes a matrix as input and derives ## the top n most intense peaks within a mass window. ## Of note, here, we require centroided data. (no profile mode!) MsBackendRawFileReader:::.top_n
We add our custom code to the processing queue of the Spectra object.
Of note, we use n = 1
in praxis n = 10
for a 100 Da mass window, which seems
to be a practical choice.
S_2 <- Spectra::addProcessing(S, MsBackendRawFileReader:::.top_n, n = 1)
The plot below displays a visual control of the custom filter function top_n.
On the top is the original spectrum, and the filtered one is on the bottom.
A point indicates peaks that match.
Spectra::plotSpectraMirror(S[9594], S_2[9594], ppm = 50)
The following snippet prints the values of the filtered peaklist and the mZ values of the y-ions.
S_2[9594] |> mz() |> unlist() yIonSeries
When reading spectra the
MsBackendRawFileReader:::.RawFileReader_read_peaks
method is calling the
rawrr::readSpectrum
method.
The figure below displays the time performance for reading a single spectrum in dependency from the chunk size (how many spectra are read in one function call) for reading different numbers of overall spectra.
ioBm <- file.path(system.file(package = 'MsBackendRawFileReader'), 'extdata', 'specs.csv') |> read.csv2(header=TRUE) # perform and include a local IO benchmark ioBmLocal <- ioBenchmark(1000, c(32, 64, 128, 256), rawfile = rawfileEH4547) lattice::xyplot((1 / as.numeric(time)) * workers ~ size | factor(n) , group = host, data = rbind(ioBm, ioBmLocal), horizontal = FALSE, scales=list(y = list(log = 10)), auto.key = TRUE, layout = c(3, 1), ylab = 'spectra read in one second', xlab = 'number of spectra / file')
We compare the output of the Thermo Fischer Scientific raw files versus
their corresponding mzXML files using Spectra::MsBackendMzR
relying on the
r BiocStyle::Biocpkg("mzR")
package.
mzXMLEH3219 <- normalizePath(eh[["EH3219"]]) mzXMLEH3221 <- normalizePath(eh[["EH3221"]])
if (require(mzR)){ beMzXML <- Spectra::backendInitialize( Spectra::MsBackendMzR(), files = c(mzXMLEH3219)) beRaw <- Spectra::backendInitialize( MsBackendRawFileReader::MsBackendRawFileReader(), files = c(rawfileEH3220)) intensity.xml <- sapply(intensity(beMzXML[1:100]), sum) intensity.raw <- sapply(intensity(beRaw[1:100]), sum) plot(intensity.xml ~ intensity.raw, log = 'xy', asp = 1, pch = 16, col = rgb(0.5, 0.5, 0.5, alpha=0.5), cex=2) abline(lm(intensity.xml ~ intensity.raw), col='red') }
Are all scans of the raw file in the mzXML file?
if (require(mzR)){ table(scanIndex(beRaw) %in% scanIndex(beMzXML)) }
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.