knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 8) library(knitr) library(rmarkdown) knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 8, fig.path = "figure/") knitr::opts_chunk$set(widgetframe_isolate_widgets = TRUE) knit_print.data.frame = function(x, ...) { rmarkdown:::print.paged_df(rmarkdown::paged_table(x)) } knit_print.DataFrame = function(x, ...) { knit_print(as.data.frame(x)) } registerS3method("knit_print", "data.frame", knit_print.data.frame) registerS3method("knit_print", "DataFrame", knit_print.DataFrame)
Through integration with Metabolomics Workbench API, lipidr
allows users,
to quickly explore public lipidomics experiments. lipidr
provides an easy
way to re-analyze and visualize these datasets.
library(lipidr) use_interactive_graphics()
First, we load lipidr
using library
command. For this vignette we will enable interactive graphics plotting by calling use_interactive_graphics()
.
Users can search all Metabolomics Workbench studies using any
relevant keyword. A list of all studies matching the keyword will be
returned as a table (data.frame
).
list_mw_studies(keyword = "lipid")
Datasets can be easily downloaded and parsed into LipidomicsExperiment
object
using lipidr
function fetch_mw_study()
by supplying a study_id
. The
example shown in this vignette is from ST001111
link. The dataset contains positive and negative MS data for untargeted
lipidomics from different breast cancer tissues.
d = fetch_mw_study("ST001111") d
Note the warning that some molecules were not parsed because their names did not follow the supported patterns. We can examine these molecules, remove them from the dataset or change their names, if desired.
# list non_parsed molecules non_parsed_molecules(d) # All of them are Ceramides, written with full chemical name # We can replace the first part with "Cer" using RegEx non_parsed <- non_parsed_molecules(d) new_names <- sub("^.* \\(", "Cer (", non_parsed) d <- update_molecule_names(d, old = non_parsed, new = new_names) # We can check once again to make sure all molecules were parsed correctly non_parsed_molecules(d)
We can have a look at the clinical data, which was conveniently
extracted from Metabolomics Workbench by lipidr
.
colData(d)
Next, we tell lipidr
that our dataset is normalized and logged.
d <- set_logged(d, "Area", TRUE) d <- set_normalized(d, "Area", TRUE)
We look at total ion concentration (TIC) and distribution (boxplot) for each sample.
plot_samples(d, "tic") plot_samples(d, "boxplot")
Although the TIC plot looks similar for all samples, we can spot two outlier samples with significantly large dispersion (samples 42
and 18
). We will keep the samples for now, but we may want to consider checking closely.
Because there is no QC samples or technical replicates in this dataset, we cannot
assess the %CV of molecules. Also, there is no need to summarize_transitions
in untargeted datasets.
mvaresults = mva(d, measure="Area", method="PCA") plot_mva(mvaresults, color_by="SampleType", components = c(1,2))
We can see mild separation between benign and cancer samples, but not between cancer and metastasis. We can also spot the sample outlier samples that should consider removing. The low variance explained by PC1
and PC2
(cumulative displayed in the plot as R2X
) indicate highly variable lipid profiles in these clinical samples.
keep_samples <- !colnames(d) %in% c("18", "42") d <- d[, keep_samples]
For simple analysis, we can compare cancer vs benign and cancer vs metastasis. From PCA plot, we can expect very little difference between cancer and metastasis.
two_group <- de_analysis(d, Cancer-Benign, Cancer-Metastasis) plot_results_volcano(two_group)
By quickly looking at the volcano plots, we can confirm the minute difference between cancer and metastatic samples. A fairly large difference is observed between cancer and benign samples, with PCs and PGs up-regulated and CLs and TGs down-regulated in cancer tissues.
Instead of two-group comparison, we might be interested in lipids are differentially expressed in any group.
We can perform ANOVA-style multi-group comparison using de_design
function, which allows users to provide
a custom design matrix. de_design
is extremely helpful in complex experimental designs, where several
factors should be accounted for in the analysis.
In this example, we will use cancer stage as our grouping variable. In our dataset, we can see samples with Stages I-IV. Using ANOVA-style analysis, we will identify all lipid molecules likely to be different among cancer stages.
multi_group <- de_design(d, ~ Stage)
Here we used the formula tilde to define our predictor variables. ~ Stage
formula indicates that we are interested in features (lipid molecules) that are associated with Stage
.
significant_molecules(multi_group)
Surprisingly, Cancer Stage does not appear to affect lipid molecules profiled in this experiment.
In complex experimental designs and clinical samples, we may need to correct for confounding variables. This is done simply by adding the variable to the formula in de_design
. For example, below, we are interested in Cancer effect while correcting for Race effect.
factorial_de <- de_design(d, ~ Race + SampleType, coef = "SampleTypeCancer") significant_molecules(factorial_de) plot_results_volcano(factorial_de)
In this case, we are seeing similar pattern as the two-group comparison, which indicates a small Race effect.
Users interested in creating more complex design matrices are referred to Limma User Guide and edgeR tutorial.
Supervised multivariate analyses, such as OPLS and OPLS-DA can be performed to determine which lipids are associated with a group (y-variable) of interest. In this example we use "Diet" as grouping, and display the results in a scores plot.
mvaresults = mva(d, method = "OPLS-DA", group_col = "SampleType", groups=c("Benign", "Cancer")) plot_mva(mvaresults, color_by="SampleType")
We can also plot the loadings and display important lipids contributing to the separation between different (Diet) groups.
plot_mva_loadings(mvaresults, color_by="Class", top.n=10)
Alternatively, we can extract top N lipids along with their annotations.
top_lipids(mvaresults, top.n=10)
OPLS-DA can only be applied to in two-group comparison settings. In some cases, we might be interested in a lipid molecules ....
In this example, we will format Cancer Stage as a numeric vector.
stage <- d$Stage stage[stage == "I"] <- 1 stage[stage == "II"] <- 2 stage[stage == "III"] <- 3 stage[stage == "IV"] <- 4 stage <- as.numeric(stage) stage
We can see stage
contains missing values. We should filter them out first.
d_filtered <- d[, !is.na(stage)] stage <- stage[!is.na(stage)] mvaresults = mva(d_filtered, method = "OPLS", group_col = stage ) plot_mva(mvaresults)
use_interactive_graphics(FALSE) plot_mva_loadings(mvaresults, color_by="Class", top.n=10)
enrich_results = lsea(two_group, rank.by = "logFC") significant_lipidsets(enrich_results)
Visualization of enrichment analysis results. The enriched lipid classes are highlighted.
plot_enrichment(two_group, significant_lipidsets(enrich_results), annotation="class")
Alternatively, we can highlight chain lengths that were significantly enriched.
plot_enrichment(two_group, significant_lipidsets(enrich_results), annotation="length")
plot_trend(two_group)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.