One of the ultimate goals in systems biology is the elucidation of the
genotype-phenotype relationship in cellular systems [1]. Integrated ‘
omics’ approaches have received particular attention. Most omics
analyses monitor the level of target molecules like transcripts and
metabolites. In this context, enrichment-analysis approaches such as
Gene Set Enrichment Analysis (GSEA) [2-3] can be combined with pathway
analysis to evaluate whether a particular molecular group is
significantly over-represented. In metabolomics, one of promising
approaches is Metabolite Set Enrichment Analysis (MSEA) [4]. MSEAp is an
R package for performing MSEA using both plant and animal metabolic
pathways and metabolite sets. The package can import pre-defined
pathways like SMPDB and KEGG and user-defined metabolite sets in
gmt format. Their metabolite sets are integrated and used
for MSEA calculation. Given a metabolite-set or a pathway annotation of
metabolites, the significance of the over-representation of metabolites
in the pathways is determined with the Fisher exact test. MSEAp package
provides various visualization functions of MSEA results like barplot
from MetaboAnalyst [5], dotplot
, and
netplot
. The package also contains pre-defined and extracted PlantCyc data that are publicly
available and our curated metabolite sets from
MeKO [6], Arabidopsis
flavonoid and
glucosinolate
(Tokimatsu, Nishida et al., in preparation), and the
plant metabolomics literature (AtMetExpress, in preparation).
# If you are using Debian or Ubuntu, please uncomment the next two lines #system("sudo apt-get update") #system("sudo apt-get install -y zlib1g-dev libxml2-dev libpng-dev") install.packages(c("devtools", "webshot", "knitr", "rmarkdown")) if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("BiocStyle", "paxtoolsr", "RCy3")) devtools::install_github("afukushima/MSEAp", build_vignettes = TRUE) devtools::install_github("afukushima/MSEApdata", build_vignettes = TRUE)
Below is an example of MSEA. mset_SMPDB_format_KEGG
is a metabolite
set that contains the metabolites for each pathway from the
SMPDB [7] database with the metabolite ID from the
KEGG [8] database.
kusano
is a metabolite list with KEGG metabolite IDs for MESA. This
list corresponds with significantly accumulated metabolites between UV-B
(one-day treatment)- and control conditions [9]. Typically, a metabolite
list is prepared by testing two hypotheses, e.g., the t-test and
modified t-statistic (e.g.,
limma
).
msea
performs MSEA and identifies biologically meaningful
accumulations in metabolomic data.
res
is the MSEA result and the plot function visualizes the top 20
over-represented pathways.
library(MSEApdata) library(MSEAp) data(mset_SMPDB_format_KEGG) data(kusano) res <- msea(mset_SMPDB_format_KEGG, kusano) plot(res) ## head(kusano) mset_SMPDB_format_KEGG[[1]]
MSEAp can import user-defined metabolite-sets in
gmt format using read.gmt()
function:
file <- system.file("extdata", "sample_SMPDB.gmt", package = "MSEAp") smp <- read.gmt(file) length(smp) smp[[1]] smp[[5]]
The function read.gpml()
can extract compound sets from
Wikipathways
GPML file to the
metabolite-sets:
file <- system.file("extdata", "At_AtMetExpress_overview_WP3622_89229.gpml", package = "MSEAp") wikip <- read.gpml(file) print(wikip)
You can see the currently available datasets by using
MSEApdata::supported.msets()
:
MSEApdata::supported.msets()
Below is a table of the number of our metabolite-sets.
| Data source | KEGG | HMDB | KNApSAcK | | :--- | ---: | ---: | ---: | ------------- | | SMPDB | 717 | 717 | NA | | AraCyc | 268 | 350 | NA | | PlantCyc | 606 | 714 | NA | | EcoCyc | 222 | 254 | NA | | MetaCyc | 1861 | 2059 | NA | | HumanCyc | 89 | NA | NA | | MouseCyc | 71 | NA | NA | | FlyCyc | 90 | NA | NA | | Reactome (Arabidopsis thaliana) | 211 | 137 | NA | | Reactome (Bos taurus) | 426 | 310 | NA | | Reactome (Caenorhabditis elegans) | 276 | 205 | NA | | Reactome (Canis familiaris) | 415 | 293 | NA | | Reactome (Dictyostelium discoideum) | 209 | 135 | NA | | Reactome (Drosophila melanogaster) | 318 | 224 | NA | | Reactome (Danio rerio) | 393 | 287 | NA | | Reactome (Gallus gallus) | 405 | 295 | NA | | Reactome (Homo sapiens) | 542 | 388 | NA | | Reactome (Mus musculus) | 432 | 308 | NA | | Reactome (Mycobacterium tuberculosis) | 9 | 7 | NA | | Reactome (Oryza sativa) | 203 | 137 | NA | | Reactome (Plasmodium falciparum) | 86 | 54 | NA | | Reactome (Rattus norvegicus) | 424 | 308 | NA | | Reactome (Saccharomyces cerevisiae) | 196 | 130 | NA | | Reactome (Schizosaccharomyces pombe) | 149 | 94| NA | | Reactome (Sus scrofa) | 424 | 301 | NA | | Reactome (Taeniopygia guttata) | 364 | 266 | NA | | Reactome (Xenopus tropicalis) | 406 | 297 | NA | | AtMetExpress (Stress)| 9 | NA | NA | | AtMetExpress (2nd Metabolism) | NA | NA | 4 |
You can perform MSEA using AraCyc pathways and our curated metabolite sets (flavonoid pathways and some literatures). MSEAp is designed to easily extend metabolite sets, and its extension can be realized only by adding a metabolite set to the R list. The process is:
data(mset_AraCyc_format_KEGG) data(mset_AtMetExpress_Stress_format_KEGG) data(mset_AtMetExpress_Flavonoids_format_KNApSAcK) mset <- c( mset_AraCyc_format_KEGG, mset_AtMetExpress_Stress_format_KEGG, mset_AtMetExpress_Flavonoids_format_KNApSAcK ) length(mset) ## 281 metabolite-sets
Here, fukushima17_INC
is a metabolite list that contains 37
significantly increased metabolites including primary and secondary
metabolites, mainly flavonols and anthocyanins that are obtained under
mild oxidative stress- and control conditions [10]. Both
KEGG [8] and
KNApSAcK [11] IDs can be used for
MSEA.
data(fukushima17_INC) head(fukushima17_INC) res <- msea(mset, fukushima17_INC$kegg_knap) head(res) plot(res)
To help interpreting the MSEA results we also implemented barplot
, dotplot
,
and netplot
for visualization.
barplot
plots the p-value as a color and the number of metabolite queries
contained in the metabolite group as the height of the bar.
dotplot
plots dots on the expected value of MSEA in each group, with the
p-value as a color (like barplot) and the number of metabolite queries
contained in the metabolite group as the size of the dot.
netplot
receives the MSEA result, the metabolite set and the
shared.metabolite
argument and plots the p-value in the MSEA result as
the node color on the network of the metabolite group. The size of the
node indicates the number of metabolite queries contained in the
metabolite group. The shared.metabolite
argument decides how to
connect the metabolite group with the edge. In the example below, if
there are more than 20 metabolites common among groups, we connect these
groups with the edge.
By default, netplot
performs visualization using the visNetwork
package, but you can add the sendto
argument and make the output
destination Cytoscape [12]. In the default
plot, it is possible to use interactive functions by, for example,
mousing over the popup of the MSEA information.
res <- msea(mset_SMPDB_format_KEGG, kusano) ## You may have to zoom this plot in RStudio and other R working environments dotplot(res) barplot(res) ## You can see the same plot by plot() netplot(res, mset_SMPDB_format_KEGG, shared.metabolite = 20)
We thank Dr. Toshiaki Tokimatsu in NIG for manual curation of the flavonoid biosynthetic pathway in Arabidopsis thaliana and Ms. Ursula Petralia for editorial assistance.
Here is the output of sessionInfo()
on the system on which this
document was compiled:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.