BiocParallel::register(BiocParallel::SerialParam()) knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message = FALSE) knitr::opts_chunk$set(collapse = TRUE) library(plotly)
RHermes is an untargeted metabolomics data processing package. Its objective is to identify a customizable set (usually >10K) of metabolites in a biological or environmental sample.
This document will guide you through the package functions and show you what a typical RHermes workflow looks like.
The recommended specs for RHermes are:
You can download the development version from GitHub with:
if(!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") devtools::install_github("RogerGinBer/RHermes")
All the information generated by RHermes is stored in a single RHermesExp object. To generate it, start with:
library(RHermes) myHermes <- RHermesExp()
This object is the input of most of the package functions, which return an updated version of the object.
After creating the object, set your MS1 experimental parameters. Use the function setExpParam(), which accepts either a custom ExpParam object or a template name:
myHermes <- setExpParam(myHermes, params = ExpParam(ppm = 4, res = 120000, ion = "+")) myHermes <- setExpParam(myHermes, template = "orbi-pos")
You can use the following template names : "orbi-pos", "orbi-neg", "qtof-pos" and "qtof-neg". Check ?ExpParam to see all customizable fields.
After setting up the ExpParam, decide which set of formulas and adducts to detect in our data with setDB():
myHermes <- setDB(myHermes, db = "hmdb") #Loads a small HMDB subset
However this "hmdb" database is just a small subset of the the real HMDB. To load a real (usually >10^3 formulas) database, use db = "custom":
#Use either a csv or a xlsx (Excel) file myHermes <- setDB(myHermes, db = "custom", DBfile = "myDatabase.csv") myHermes <- setDB(myHermes, db = "custom", DBfile = "myDatabase.xlsx")
This database needs at least two columns named exactly like this:
Name
and MolecularFormula
. You may have other columns with metadata related
to the compounds, RHermes just needs those two.
A general list of adducts will be set according to the experimental
parameters (positive or negative ionizzation). You can choose multicharged
and multimer adducts by using the adcharge
and admult
parameters in setDB().
For instance, you could do:
myHermes <- setDB(myHermes, db = "hmdb", admult = 2, adcharge = 2)
And this would set your adduct list to all adducts with charge <=2 and multiplicity <= 2 (for instance [2M+H]+, [M+2H]2+ and so on).
You may also just select a subset of adducts using with adlist
param:
myHermes <- setDB(myHermes, db = "hmdb", admult = 2, adcharge = 2, adlist = c("M+H", "M+Na", "M+K", "M+2H", "2M+H"))
In general, the less formula/adduct combinations, the faster the program runs. It is advisable to remove very uncommon adducts from the adduct list (such as those coming from solvents not used in your particular experiment, eg. M+ACN+H). To check which adducts are selected, use adlist(). You can manually add new adduct profiles with addAd() and remove them with remAd():
#We will manually add the "M+2H" adduct myHermes <- addAd(myHermes, name = "M+2H", deltam = 2*1.00727600, ch = 2, mult = 1, toadd = "H2") #For instance, remove adducts of unused solvents myHermes <- remAd(myHermes, c("M+DMSO+H", "M+IsoProp+H"))
adlist(myHermes)
knitr::kable(adlist(myHermes))
Wrapping up: here's what the object looks like up to this point:
myHermes #Shows a summary of all set parameters and object info
As you can see above, the RHermesExp object keeps track of all processing steps it's gone through, as well as some other troubleshooting info like the OS and the RHermes version it was originally generated with.
The main interface to process files is the function processMS1(), which accepts an RHermesExp object and a list of paths to the MS1 files to process. All the files must share the used experimental parameters (meaning data acquired in different polarities or in different instruments can't be processed at the same time).
For this guide, we will use a small test mzML provided with the package:
myHermes <- processMS1(myHermes, system.file("extdata", "MS1TestData.mzML", package = "RHermes"))
Once done, you can check some stats about the files you've just processed:
#You could do either of these, but PL it's more direct if you just want to check #one file: # myHermes PL(myHermes, 1)
Before diving into the process itself, we should talk first about the detection parameters; There are two ways to generate them:
Specify an already-made template with getSOIpar()
Create your own custom set of parameters with the constructor SOIParam()
It works either way, but I'd advise to just use getSOIpar() because it's easier and the templates have already been tested. There are six templates available, which only differ by the number and width of the applied bins: simple, double, triple, simple-x, double-x, and triple-x. -x meaning extended bins (15s) as opposed to the default 5s bins. It's useful to use -x if you are using a regular HPLC setup to avoid SOI splitting and to improve running time. On a UPLC setup, you can skip the -x.
So when should one use SOIParam directly? Well, it's convenient if you want to change the bin sizes used for the scan density calculation and also if you want to be stricter about the minimum required density or the minimum data point intensity (!).
Once you have your SOI parameters, pass them to findSOI and specify which files you want to use:
s <- getSOIpar("double") myHermes <- findSOI(myHermes, s, 1)
And what about blank subtraction? It's actually really simple: just add an extra blank argument to findSOI(). You can also generate as many SOIs as you want in just one function call by passing a vector of sample IDs and blank-to-be-used IDs (keep in mind that "No blank substraction" must be represented as 0).
Since our test dataset doesn't include blank data (it's just one file), this next chunk will not be actually executed, but you get the gist of it:
#Generate a SOI list of file 2 without blank subtraction and another one #using file 1 as blank myHermes <- findSOI(myHermes, s, c(2,2), c(0,1))
Now that you generated a SOI list, you can now filter it to remove low intensity SOIs, bad isotopic fidelity SOIs and likely in-source fragments (ISF).
For intensity and isotopic fidelity filtering we use filterSOI():
myHermes <- filterSOI(myHermes, id = 1, minint = 20000, isofidelity = TRUE)
Once you have a SOI list you're satisfied with, you can generate an Inclusion List to perform the MS2 acquisition with:
To do it, write:
#Only uses SOIs of adducts in "ad". In this case, only M+H adducts myHermes <- generateIL(myHermes, id = 1, par = ILParam(priorization = "only", ad = "M+H")) #If there's an M+H adduct annotation, don't include other adducts. #Only works if you've performed filterSOI first, since it requires adduct #annotation grouping myHermes <- generateIL(myHermes, id = 1, par = ILParam(priorization = "yes", ad = "M+H")) #Use all SOIs in the SOI list myHermes <- generateIL(myHermes, id = 1, par = ILParam(priorization = "all"))
Once generated, you can plot it, filter it or export it.
If you think the IL contains too many entries in a certain RT interval, you may want to filter them like this:
#This removes all IL entries below 50000 intensity in the 0s-150s RT interval myHermes <- filterIL(myHermes, 1, rts = c(0,150), minint = 5e4)
To export the IL entries into individual csv files (one per injection), type:
exportIL(myHermes, id = 1, file = "InclusionList", maxOver = 5, sepFiles = TRUE)
Here we plan the IL allowing up to 5 entries to be monitored by the instrument at the same time. Depending on your instrument MS2 injection time (IT), you may want to increase maxOver to between 10-15 to reduce the number of injections while keeping a good scan rate.
We are currently developing new acquisition strategies to improve the IL coverage while retaining high spectral quality and a low number of injections. Expect these previous functions to be expanded and change over time.
Once you have acquired your MS2 data, use processMS2 just like this:
myHermes <- processMS2(myHermes, id = 1, MS2files = c("./file1.mzML", "./file2.mzML", "./file3.mzML", "./file4.mzML"), sstype = "regular", useDB = FALSE)
Since the package doesn't come with the MS2 mzML files corresponding to the MS1 test dataset (as it would be very heavy) we have already pre-processed the IL:
myHermes <- readRDS(system.file("extdata", "exampleObject.rds", package = "RHermes"))
You can extract the processed MS2 spectra with Ident():
Ident(myHermes, 1)
At any point of the processing (for instance, after generating an Inclusion List), you may save your object for later use. To do so, type:
saveRDS(myHermes, "testRHermes.rds")
To load it, type:
myHermes <- readRDS("testRHermes.rds")
This RDS file you've just generated can be loaded into the GUI to continue the processing there or to easily view the plots. It also serves a a "save point" in your processing, in case you want to start again from some step
In case you prefer to generate the plots "manually" (as opposed to using the GUI), here are some instructions for the different plotting functions:
plotPL(myHermes, 1, "C5H11NO2", c("M+H", "M+Na", "M+K"), c(80, 120))
plotSOI(myHermes, 1, "C5H11NO2", c("M+H", "M+Na", "M+K"), c(80, 120))
Bear in mind that this IL is very limited because of the small number of formulas and the filtered mzML we used.
plotIL(myHermes, id = 1)
p <- plotRawMS2(myHermes, ms2id = 1, entryid = 2) ggplotly(p$p_bymz) ggplotly(p$p_bygroup)
To start the app, simply type:
RHermesGUI()
And the app should start in your default browser.
Another parameter you want to set up (especially if you're dealing with many formulas and adducts - say >10K formulas) is a custom parallel backend.
RHermes functions use your default backend (bpparam()), but you can set any backend you like with register(). You can use setCluster to automatically select an appropiate backend based on your OS and RAM.
BiocParallel::register(setCluster())
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.