suppressPackageStartupMessages(library(structToolbox)) suppressPackageStartupMessages(library(httptest)) suppressPackageStartupMessages(library(metabolomicsWorkbenchR)) httptest::start_vignette('structToolbox_example')
Metabolomics Workbench (link) hosts a metabolomics data repository. It contains over 1000 publicly available studies including raw data, processed data and metabolite/compound information.
The repository is searchable using a REST service API. The metabolomicsWorkbenchR package makes the endpoints of this service available in R and provides functionality to search the database and import datasets and metabolite information into commonly used formats such as data frames and SummarizedExperiment objects.
In this vigenette we will use metabolomicsWorkbenchR
to retrieve the uploaded peak matrix
for a study. We will then use structToolbox
to apply a basic workflow to analyse the data.
To install this package enter:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("metabolomicsWorkbenchR")
For older versions, please refer to the appropriate Bioconductor release.
The API endpoints for Metabolomics Workbench are accessible using the do_query
function in metabolomicsWorkBenchR
.
The do_query
functions takes 4 inputs:
- context
A valid context name (character)
- input_item
A valid input_item name (character)
- input_value
A valid input_value name (character)
- output_item
A valid output_item (character)
Contexts refer to the different database searches available in the API. The reader
is referred to the API manual for details of each context
(link).
In metabolomicsWorkBenchR
contexts are stored as a list, and a list of valid
contexts can be obtained using the names
function:
names(metabolomicsWorkbenchR::context)
input_item
is specific to a context. Valid items for a context can
be listed using context_inputs
function:
cat('Valid inputs:\n') context_inputs('study') cat('\nValid outputs:\n') context_outputs('study')
First we query the database to return a list of untargeted studies. We use the "study" context in combination with a special case input item called "ignored" that is required for the "untarg_studies" output item.
US = do_query( context = 'study', input_item = 'ignored', input_value = 'ignored', output_item = 'untarg_studies' ) head(US[,1:3])
We will pull data for study "ST000009". We can obtain summary information using the "summary" output item.
S = do_query('study','study_id','ST000010','summary') t(S)
As there are multiple datasets per study untargeted data needs to be requested
by Analysis ID. We will request DatasetExperiment format so that we can use the
data directly with structToolbox
.
DE = do_query( context = 'study', input_item = 'analysis_id', input_value = 'AN000025', output_item = 'untarg_DatasetExperiment' ) DE
DE=metabolomicsWorkbenchR:::AN000025 DE=as.DatasetExperiment(DE) DE
Now we construct a minimal metabolomics workflow consisting of quality filtering, normalisation, imputation and scaling before applying PCA.
# model sequence M = mv_feature_filter( threshold = 40, method='across', factor_name='FCS') + mv_sample_filter(mv_threshold =40) + vec_norm() + knn_impute() + log_transform() + mean_centre() + PCA() # apply model M = model_apply(M,DE) # pca scores plot C = pca_scores_plot(factor_name=c('FCS')) chart_plot(C,M[length(M)])
sessionInfo()
end_vignette()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.