In b2slab/FELLA: Interpretation and enrichment for metabolomics data

Introduction

This vignette shows the utility of the FELLA package, which is based in a statistically normalised diffusion process [@picart2017null], on non-human organisms. In particular, we will work on a multi-omic Mus musculus study. The original study [@gogiashvili2017metabolic] presents a mouse model of the non-alcoholic fatty liver disease (NAFLD). Metabolites in liver tissue from leptin-deficient ob/ob mice and wild type mice were compared using Nuclear Magnetic Resonance (NMR). Afterwards, quantitative real-time polymerase chain reaction (qRT-PCR) helped identify changes at the gene expression level. Finally, biological mechanisms behind NAFLD were elucidated by leveraging the data from both omics.

Building the database

The first step is to build the FELLA.DATA object for the mmu organism from the KEGG database [@kanehisa2016kegg].

library(FELLA)
library(org.Mm.eg.db)
library(KEGGREST)

library(igraph)
library(magrittr)

set.seed(1)
# Filter overview pathways
graph <- buildGraphFromKEGGREST(
    organism = "mmu", 
    filter.path = c("01100", "01200", "01210", "01212", "01230"))

tmpdir <- paste0(tempdir(), "/my_database")
# Mke sure the database does not exist from a former vignette build
# Otherwise the vignette will rise an error 
# because FELLA will not overwrite an existing database
unlink(tmpdir, recursive = TRUE)  
buildDataFromGraph(
    keggdata.graph = graph, 
    databaseDir = tmpdir, 
    internalDir = FALSE, 
    matrices = "none", 
    normality = "diffusion", 
    niter = 100)

We load the FELLA.DATA object and two mappings (from gene symbol to entrez identifiers, and from enzyme EC numbers to their annotated entrez genes).

alias2entrez <- as.list(org.Mm.eg.db::org.Mm.egSYMBOL2EG)
entrez2ec <- KEGGREST::keggLink("enzyme", "mmu")
entrez2path <- KEGGREST::keggLink("pathway", "mmu")

fella.data <- loadKEGGdata(
    databaseDir = tmpdir, 
    internalDir = FALSE, 
    loadMatrix = "none"
)

Summary of the database:

fella.data

In addition, we will store the ids of all the metabolites, reactions and enzymes in the database:

id.cpd <- getCom(fella.data, level = 5, format = "id") %>% names
id.rx <- getCom(fella.data, level = 4, format = "id") %>% names
id.ec <- getCom(fella.data, level = 3, format = "id") %>% names

Note on reproducibility

We want to emphasise that FELLA builds its FELLA.DATA object using the most recent version of the KEGG database. KEGG is frequently updated and therefore small changes can take place in the knowledge graph between different releases. The discussion on our findings was written at the date specified in the vignette header and using the KEGG release in the Reproducibility section.

Enrichment analysis

Defining the input and running the enrichment

Table 2 from the main body in [@gogiashvili2017metabolic] contains six metabolites that show significant changes between the experimental classes by a univariate test followed by multiple test correction. These are the start of our enrichment analysis:

cpd.nafld <- c(
    "C00020", # AMP
    "C00719", # Betaine
    "C00114", # Choline
    "C00037", # Glycine
    "C00160", # Glycolate
    "C01104"  # Trimethylamine-N-oxide
)

analysis.nafld <- enrich(
    compounds = cpd.nafld, 
    data = fella.data, 
    method = "diffusion", 
    approx = "normality")

Five compounds are successfully mapped to the graph object:

analysis.nafld %>% 
    getInput %>% 
    getName(data = fella.data)

Likewise, one compound does not map:

getExcluded(analysis.nafld)

The highlighted subgraph with the default parameters has the following appeareance, with large connected components that involve the metabolites in the input:

plot(
    analysis.nafld, 
    method = "diffusion", 
    data = fella.data, 
    nlimit = 250,  
    plotLegend = FALSE)

We will also extract all the p-scores and the suggested sub-network for further analysis:

g.nafld <-  generateResultsGraph(
    object = analysis.nafld, 
    data = fella.data, 
    method = "diffusion")

pscores.nafld <- getPscores(
    object = analysis.nafld, 
    method = "diffusion")

Examining the metabolites

From Table 2

The authors find 5 extra metabolites in Table 2 that are significant at $p < 0.05$ but do not appear after thresholding the false discovery rate at 5%. Such metabolites, highlighted in italics but without an asterisk, are also relevant and play a role in their discussion. We will examine how FELLA prioritises such metabolites:

cpd.nafld.suggestive <- c(
    "C00008", # ADP
    "C00791", # Creatinine
    "C00025", # Glutamate
    "C01026", # N,N-dimethylglycine
    "C00079", # Phenylalanine
    "C00299"  # Uridine
)
getName(cpd.nafld.suggestive, data = fella.data)

When checking if any of these metabolites are found in the reported sub-network, we find that C01026 is already reported:

V(g.nafld)$name %>% 
    intersect(cpd.nafld.suggestive) %>% 
    getName(data = fella.data)

Abbreviated as DMG in their study, N,N-Dimethylglycine is a cornerstone of their findings. It is reported in Figure 6a as part of the folate-independent remethylation to explain the metabolic changes observed in the ob/ob mice. DMG is also mentioned in the conclusions as part of one of the most prominent alterations found in the study: a reduced conversion of betaine to DMG.

From Figure 6a

Figure 6a contains the metabolic context of the observed alterations, with processes such as transsulfuration and folate-dependent remethylation. These were identified with the help of gene expression analysis. We will now check for coincidences between the metabolites in Figure 6a, excluding choline and betaine for being in the input and DMG since it was already discussed.

cpd.new.fig6 <- c(
    "C00101", # THF
    "C00440", # 5-CH3-THF
    "C00143", # 5,10-CH3-THF
    "C00073", # Methionine
    "C00019", # SAM
    "C00021", # SAH
    "C00155", # Homocysteine
    "C02291", # Cystathione
    "C00097"  # Cysteine
)
getName(cpd.new.fig6, data = fella.data)

This time, there are no coincidences with the reported sub-network:

cpd.new.fig6 %in% V(g.nafld)$name

However, we can further inquire whether the p-scores of such metabolites tend to be low among all the metabolites in the whole network from the fella.data object.

wilcox.test(
    x = pscores.nafld[cpd.new.fig6], # metabolites from fig6
    y = pscores.nafld[setdiff(id.cpd, cpd.new.fig6)], # rest of metabolites
    alternative = "less")

The test is indeed significant -- despite FELLA does not directly report such metabolites, its metabolite ranking supports the claims by the authors.

Examining the genes

Cbs

The authors complement the metabolomic profilings with a differential gene expression study. One of the main findings is a change of Cbs expression levels. To link Cbs to the enrichment from FELLA, we will first map it to its EC number, 4.2.1.22 at the time of writing:

ec.cbs <- entrez2ec[[paste0("mmu:", alias2entrez[["Cbs"]])]] %>% 
    gsub(pattern = "ec:", replacement = "")

getName(fella.data, ec.cbs)

In Figure 6a, the reaction linked to Cbs and catalysed by the enzyme 4.2.1.22 has the KEGG identifier R01290.

rx.cbs <- "R01290"

getName(fella.data, rx.cbs)

As shown in Figure 6a, Cbs is not directly linked to the metabolites found through NMR, and nor the reaction neither the enzyme are suggested by FELLA:

c(rx.cbs, ec.cbs) %in% V(g.nafld)$name

However, both of them have a relatively low p-score in their respective categories. This can be seen through the proportion of enzymes (resp. reactions) that show a p-score as low or lower than 4.2.1.22 (resp. R01290)

# enzyme
pscores.nafld[ec.cbs]
mean(pscores.nafld[id.ec] <= pscores.nafld[ec.cbs])

# reaction
pscores.nafld[rx.cbs]
mean(pscores.nafld[id.rx] <= pscores.nafld[rx.cbs])

It's not surprising that none of them is directly reported, because none of the metabolites participating in the reaction is found in the input. The main evidence for finding Cbs is gene expression, and our approach gives indirect hints of this connection.

Bhmt

The alteration of Bhmt activity is related to the downregulation of Cbs. Despite not finding evidence of change in Bhmt expression, the authors argue that its inhibition would explain the increased betaine-to-DMG ratio in ob/ob mice. Such claim is also backed up by prior studies. To find out the role of Cbs in our analysis, we will again map it to its EC number, 2.1.1.5:

ec.bhmt <- entrez2ec[[paste0("mmu:", alias2entrez[["Bhmt"]])]] %>% 
    gsub(pattern = "ec:", replacement = "")

getName(fella.data, ec.bhmt)

This time, FELLA not only reports it, but also its associated reaction R02821 (represented by an arrow in Figure 6a) and both of its metabolites. While betaine was already an input metabolite, DMG was a novel finding as discussed earlier

ec.bhmt %in% V(g.nafld)$name
"R02821" %in% V(g.nafld)$name

This illustrates how FELLA can translate knowledge from dysregulated metabolites to other molecular levels, such as reactions and enzymes.

Slc22a5

The decrease of Bhmt activity is later connected to the upregulation of Slc22a5, also proved within the original study. However, Slc22a5 does not map to any EC number and therefore it cannot be found through FELLA:

entrez.slc22a5 <- alias2entrez[["Slc22a5"]]
entrez.slc22a5 %in% names(entrez2ec)

As a matter of fact, the only connection that can be found from KEGG is the role of Slc22a5 in the Choline metabolism in cancer pathway.

path.slc22a5 <- entrez2path[paste0("mmu:", entrez.slc22a5)] %>% 
    gsub(pattern = "path:", replacement = "")

getName(fella.data, path.slc22a5)

Coincidentally, this pathway is reported in the sub-graph:

path.slc22a5 %in% V(g.nafld)$name

Genes from Figure 3

We also examined if genes from Table 3 were reachable in our analysis. These five literature-derived genes were experimentally confirmed to show gene expression changes, in order to prove that RNA extracted after the metabolomic profiling was still reliable for further transcriptomic analyses. However, only Scd2 maps to an enzymatic family:

symbol.fig3 <- c(
    "Cd36",
    "Scd2", 
    "Apoa4", 
    "Lcn2", 
    "Apom")

entrez.fig3 <- alias2entrez[symbol.fig3] %>% unlist %>% unique
ec.fig3 <- entrez2ec[paste0("mmu:", entrez.fig3)] %T>% 
    print %>%
    unlist %>% 
    unique %>% 
    na.omit %>% 
    gsub(pattern = "ec:", replacement = "")

getName(fella.data, ec.fig3)

Such family is not reported in our sub-graph

ec.fig3 %in% V(g.nafld)$name

In addition, its p-score is high compared to other enzymes

pscores.nafld[ec.fig3]
mean(pscores.nafld[id.ec] <= pscores.nafld[ec.fig3])

The fact that only one gene mapped to an EC number hinders the potential findings using FELLA, and is probably the main reason why FELLA missed Scd2. In addition, FELLA defines a knowledge model that offers simplicity and interpretability, at the cost of introducing limitations on how sophisticated its findings can be.

Genes from Table S2

In parallel with the original study, and cited within its main body, gene array expression data was collected [@godoy2016gene] and its hits are included in the supplementary Table S2 from [@gogiashvili2017metabolic]. These genes include the already discussed Cbs. We will attempt to link the genes marked as significantly changed to our reported sub-network. In contrast with Figure 3, all the genes map to an EC number:

symbol.tableS2 <- c(
    "Mat1a",
    "Ahcyl2", 
    "Cbs",
    "Mat2b",  
    "Mtr")
entrez.tableS2 <- alias2entrez[symbol.tableS2] %>% unlist %>% unique
ec.tableS2 <- entrez2ec[paste0("mmu:", entrez.tableS2)] %T>%
    print %>%
    unlist %>% 
    unique %>% 
    na.omit %>% 
    gsub(pattern = "ec:", replacement = "")

None of these EC families are reported in the sub-graph:

ec.tableS2 %in% V(g.nafld)$name

But in this case, their scores tend to be lower than the rest of enzymes:

wilcox.test(
    x = pscores.nafld[ec.tableS2], # enzymes from table S2
    y = pscores.nafld[setdiff(id.ec, ec.tableS2)], # rest of enzymes
    alternative = "less")

These findings suggest that if the annotation database is complete enough, FELLA can provide a meaningful priorisisation of the enzymes surrounding the affected metabolites.

Conclusions

FELLA has been used to give a biological meaning to a list of 6 metabolites extracted from a multi-omic study of a mouse model of NAFLD. It has been able to reproduce some findings at the metabolite and gene expression levels, whereas most of the times missed entities would still present a low ranking compared to their background in the database.
The bottom line from our analysis in the present vignette is that FELLA not only works on human studies, but also generalises to animal models.

Reproducibility

This is the result of running sessionInfo()

sessionInfo()

KEGG version:

cat(getInfo(fella.data))

Date of generation:

date()

Image of the workspace (for submission):

tempfile(pattern = "vignette_mmu_", fileext = ".RData") %T>% 
    message("Saving workspace to ", .) %>% 
    save.image(compress = "xz")

References {#references}

b2slab/FELLA documentation built on March 3, 2021, 2:22 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

b2slab/FELLA
Interpretation and enrichment for metabolomics data

In b2slab/FELLA: Interpretation and enrichment for metabolomics data

Introduction

Building the database

Note on reproducibility

Enrichment analysis

Defining the input and running the enrichment

Examining the metabolites

From Table 2

From Figure 6a

Examining the genes

Cbs

Bhmt

Slc22a5

Genes from Figure 3

Genes from Table S2

Conclusions

Reproducibility

References {#references}

R Package Documentation

Browse R Packages

We want your feedback!

b2slab/FELLA Interpretation and enrichment for metabolomics data

In b2slab/FELLA: Interpretation and enrichment for metabolomics data

Introduction

Building the database

Note on reproducibility

Enrichment analysis

Defining the input and running the enrichment

Examining the metabolites

From Table 2

From Figure 6a

Examining the genes

Cbs

Bhmt

Slc22a5

Genes from Figure 3

Genes from Table S2

Conclusions

Reproducibility

References {#references}

R Package Documentation

Browse R Packages

We want your feedback!

b2slab/FELLA
Interpretation and enrichment for metabolomics data