knitr::opts_chunk$set(fig.align = 'center') library(magrittr)
The metabolyseR package provides a suite of methods that encompass three elements of metabolomics data analysis:
The package also distinguishes between the flexibility and simplicity required for exploratory analyses compared to the convenience needed for more complex routine analyses. This is reflected in the underlying S4 object-oriented implementations and associated methods defined within the package. It should be noted that it is useful to understand the principles involved in using metabolyseR for exploratory analyses to aid in extracting and wrangling the results generated from routine analyses.
The following document will provide an introduction to the basic usage of the package and includes how to create and use the base classes that are the foundation of metabolyseR. This will be focused around the applications for both exploratory and routine analyses. For more detailed information on the individual analysis elements see their associated vignette using:
browseVignettes('metabolyseR')
There is also an example quick start analysis vignette provided.
vignette('quick_start','metabolyseR')
Any issues, bugs or errors encountered while using the package should be reported here.
The examples shown here will use the abr1
data set from the metaboData package (?metaboData::abr1
).
This is a nominal mass flow-injection mass spectrometry (FI-MS) fingerprinting data set from a plant-pathogen infection time course experiment.
The examples will also include use of the pipe %>%
from the magrittr package.
Firstly load the necessary packages:
library(metabolyseR) library(metaboData)
The package supports parallel processing using the future package.
By default, processing by metabolyseR
will be done sequentially.
However, parallel processing can be activated, prior to analysis, by specifying a parallel back-end using plan()
.
The following example specifies using the multisession
implementation (multiple background R sessions) with two worker processes.
plan(future::multisession,workers = 2)
See the future package documentation for more information on the types of parallel implementations that are available.
For exploratory analyses, simple questions of the data need to be answered quickly, requiring few steps. Key requirements for any tool used by investigators are that it should be both simple and flexible.
In metabolyseR, the AnalysisData
class is the base S4 class that provides these requirements.
The following sections will give an overview of the basics in constructing and using these objects as the base for analysis.
We can firstly construct an AnalysisData
object which requires two data tables.
The first is the metabolomic data where the columns are the metabolome features, the rows the sample observations and contains the abundance values.
The second is the sample meta-information where the row order should match to that of the metabolome data table.
Using the example data, his can be constructed and assigned to the variable d
by:
d <- analysisData(data = abr1$neg, info = abr1$fact)
Where abr1$neg
is the negative ionisation mode data and abr1$fact
is the corresponding sample information.
By printing d
we can view some basic information about our data.
print(d)
We can also return the numbers of samples and numbers of features respectively using the following:
nSamples(d) nFeatures(d)
The data table can be extracted using the dat
method:
dat(d)
Or alternatively, can be used to assign a new data table:
dat(d) <- abr1$pos d
The sample information table can be extracted using the sinfo
method:
sinfo(d)
And similarly used to assign a new sample information table:
sinfo(d) <- abr1$fact[,1:2] d
d <- analysisData(abr1$neg,abr1$fact)
There are a number of methods that provide utility for querying and altering the sample information within an AnalysisData
object.
These methods are all named with the prefix cls
and include:
getNamespaceExports('metabolyseR') %>% {.[stringr::str_detect(.,'cls')]} %>% {.[!stringr::str_detect(.,':')]} %>% sort() %>% stringr::str_c('* `',.,'`') %>% stringr::str_c(collapse = '\n') %>% cat()
The names of the available sample information columns can be shown using clsAvailable()
.
clsAvailable(d)
A given column can be extracted using clsExtract()
.
Here, the day
column is extracted.
clsExtract(d,cls = 'day')
Sample class frequencies could then be computed.
clsExtract(d,cls = 'day') %>% table()
It can be seen that there are 20 samples available in each class.
Another example is the addition of a new sample information column.
In the following, a column called new_class
will be added with all samples labelled 1
.
d <- clsAdd(d,cls = 'new_class',value = rep(1,nSamples(d))) clsAvailable(d)
Samples or features can easily be kept or removed from an AnalysisData
object as is most convenient.
Below can be seen the first 6 sample indexes in the injorder
column of the sample information.
samples <- d %>% clsExtract(cls = 'injorder') %>% head() print(samples)
Only these samples could be kept using:
d %>% keepSamples(idx = 'injorder',samples = samples)
Or removed using:
d %>% removeSamples(idx = 'injorder',samples = samples)
The process is very similar for keeping or removing specific metabolome features from the data table. Below can be seen the first 6 feature names in the data table.
feat <- d %>% features() %>% head() print(feat)
Only these features can be kept using:
d %>% keepFeatures(features = feat)
Or to remove these features:
d %>% removeFeatures(features = feat)
Routine analyses are those that are often made up of numerous steps where parameters have likely already been previously established. The emphasis here is on convenience with as little code as possible required. In these analyses, the necessary analysis elements, order and parameters are first prepared and then the analysis routine subsequently performed in a single step. This section will introduce how this type of analysis can be performed using metabolyseR and will include four main topics:
Parameter selection is the fundamental aspect for performing routine analyses using metabolyseR and will be the step requiring the most input from the user.
The parameters for an analysis are stored in an S4 object of class AnalysisParameters
containing the relevant parameters of the selected analysis elements.
The parameters have been named so that they denote the same functionality commonly across all analysis element methods. Discussion of the specific parameters can be found withing the vignettes of the relevant analysis elements. These can be accessed using:
browseVignettes('metabolyseR')
There are several ways to specify the parameters to use for analysis. The first is programatically and the second is through the use of the YAML format.
The available analysis elements can be shown using:
analysisElements()
The analysisParameters()
function can be used to create an AnalysisParameters
object containing the default parameters.
For example, the code below will return default parameters for all the metabolyseR analysis elements.
p <- analysisParameters() p
To retrieve parameters for a subset of analysis elements the following can be run, returning parameters for only the pre-treatment and modelling elements.
p <- analysisParameters(c('pre-treatment','modelling')) p
The changeParameter()
function can be used to uniformly change these parameters across all of the selected methods.
The example below changes the defaults of all the parameters named cls
from the default class
to day
.
p <- analysisParameters() changeParameter(p,'cls') <- 'day' p
Alternatively the parameters of a specific analysis elements can be targeted using the elements
argument.
The following will only alter the cls
parameter back to class
for the pre-treatment element parameters:
changeParameter(p,'cls',elements = 'pre-treatment') <- 'class'
Parameters can be extracted from the AnalysisParameters
class using the parameters()
function for a specified element.
parameters(p,'correlations')
Each analysis element has a function for returning default parameters for specific methods.
These include preTreatmentParameters()
, modellingParameters()
and correlationParameters()
.
Each returns a list of the default parameters for a specified methods as shown in the example for modellingParameters()
below.
modellingParameters('anova')
Refer to the documentation (?
) of each function for sepecific usage details.
The parameters returned by these functions can be assigned to an AnalysisParameters
object, again using parameters()
'
parameters(p,'pre-treatment') <- preTreatmentParameters( list( occupancyFilter = 'maximum', transform = 'TICnorm' ) )
Due to the relatively complex structure of the parameters needed for analyses containing many components, it is also possible to specify analysis parameters using the YAML file format.
YAML parameter files (.yaml) can be parsed using the parseParameters()
function.
The example below shows the YAML specification for the defaults returned by analysisParameters()
.
paramFile <- system.file('defaultParameters.yaml',package = 'metabolyseR') stringr::str_c(" ```yaml ", yaml::read_yaml(paramFile) %>% yaml::as.yaml(), "```") %>% cat()
This can be passed directly into an AnalysisParameters
object using the following:
paramFile <- system.file('defaultParameters.yaml',package = 'metabolyseR') p <- parseParameters(paramFile)
For more complex pre-treatment situations such as the following:
exampleParamFile <- system.file('exampleParameters.yaml',package = 'metabolyseR') stringr::str_c(" ```yaml ", yaml::read_yaml(exampleParamFile) %>% yaml::as.yaml(), "```") %>% cat()
Where multiple steps of the same method needed (here is remove
), these are numbered sequentially.
Where multiple values also need to be provided to a particular argument (e.g. classes = c('H','1')
), these should be supplied as a hyphenated list.
Existing AnalysisParameters
objects can also be exported to YAML format as shown below:
p <- analysisParameters() exportParameters(p,file = 'analysis_parameters.yaml')
The analysis is performed in a single step using the metabolyse()
function.
This accepts the metabolomic data, the sample information and the analysis parameters.
The metabolomic data table of abundance values where the columns are the metabolome features and the rows are each sample observation. Similarly, the sample meta-information table should consist of the observations as rows and the meta information as columns. The order of the observation rows of the sample information table should be concordant with the rows in the metabolomics data table.
We can run an example analysis using the abr1
data set by first generating the default parameters for pre-treatment and modelling (random forest) analysis elements.
p <- analysisParameters(c('pre-treatment','modelling'))
Custom pre-treatment parameters can then be specified to only inlude occupancy filtering and total ion count normalisation.
parameters(p,'pre-treatment') <- preTreatmentParameters( list( occupancyFilter = 'maximum', transform = 'TICnorm') )
Next the cls
parameters can be changed to use the day
sample information column throughout the analysis.
changeParameter(p,'cls') <- 'day'
Finally, the analysis can be run in a single step. Here only the fist 200 features of the negative ionisation mode data are specified to reduce the analysis time needed for this example.
analysis <- metabolyse(abr1$neg[,1:200],abr1$fact,p)
Note: If a data pre-treatment step is not performed prior to modelling or correlation analysis, the raw data will automatically be used.
The analysis
object containing the analysis results can be printed to provide some basic information about the results of the analysis.
print(analysis)
There are likely to be occasions where an analysis will need to be re-analysed using a new set of parameters.
This can be achieved using the reAnalyse()
function.
In the example below we will run a correlation analysis in addition to the pre-treatment and modelling elements already performed.
Firstly, we can specify the correlation parameters:
parameters <- analysisParameters('correlations')
Then perform the re-analysis on our previously analysed Analysis
object, specifying the additional parameters.
analysis <- reAnalyse(analysis,parameters)
An overview of the results of the analysis (now including correlations) can then be printed.
print(analysis)
An analysis performed by metabolyse()
returns an S4 object of class Analysis
.
There are a number of ways of extracting analysis results from this object.
Similarly to the AnalysisData
class, the dat()
and sinfo()
functions can be used to extract the metabolomics data or sample information tables directly for either the raw
or pre-treated
data.
For example, to extract the pre-treated metabolomics data from our object analysis
:
dat(analysis,type = 'pre-treated')
Or to extract the raw sample information:
sinfo(analysis,type = 'raw')
Alternatively the raw
or preTreated
functions can be used to extract the AnalysisData
class objects containing both the metabolomics data and sample information for the raw and pre-treated data respectively.
raw(analysis)
preTreated(analysis)
Lastly the analysisResults
function can be used to extract the results of any of the analysis elements.
The following will extract the modelling results:
analysisResults(analysis,element = 'modelling')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.