biodbKegg is a biodb extension package that implements a connector to KEGG Compound database [@kanehisa2000_KEGG].
Install using Bioconductor:
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install('biodbKegg')
The first step in using biodbKegg, is to create an instance of the biodb
class BiodbMain
from the main biodb package. This is done by calling the
constructor of the class:
mybiodb <- biodb::newInst()
During this step the configuration is set up, the cache system is initialized and extension packages are loaded.
We will see at the end of this vignette that the biodb instance needs to be
terminated with a call to the terminate()
method.
In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbKegg implements a connector to a remote database. Here is the code to instantiate a connector:
kegg.comp.conn <- mybiodb$getFactory()$createConn('kegg.compound')
To retrieve entries, use:
entries <- kegg.comp.conn$getEntry(c('C00133', 'C00751')) entries
To convert a list of entries into a dataframe, run:
x <- mybiodb$entriesToDataframe(entries, compute=FALSE) x
ids <- kegg.comp.conn$searchForEntries(list(monoisotopic.mass=list(value=64, delta=2.0)), max.results=10) entries <- mybiodb$getFactory()$getEntry('kegg.compound', ids)
If you have a data frame containing a column with KEGG Compound IDs, you can add information such as associated KEGG Enzymes, associated KEGG Pathways and KEGG Modules to your data frame, for a specific organism.
For the example we use the list of compound IDs we already have, to construct a data frame:
kegg.comp.ids <- c('C06144', 'C06178', 'C02659') mydf <- data.frame(kegg.ids=kegg.comp.ids)
Using the addInfo()
method of KeggCompoundConn
class, we add information
about pathways, enzymes and modules for these compounds:
kegg.comp.conn$addInfo(mydf, id.col='kegg.ids', org='mmu')
Note that, by default, the number of values for each field is limited to 3.
Please see the help page of KeggCompoundConn
for more information about
addInfo()
, and a description of all parameters.
The list of organisms is available at https://www.genome.jp/kegg/catalog/org_list.html.
In this example we will look for pathways related to specified compounds, count to how many pathways each compound is related, build a pathway graph, and create a decorated pathway graph picture.
For that we will start from a given list of KEGG compound IDs, and explore KEGG to find to which organisms they can be related and try to discover links between them through KEGG pathways.
As an example, we will start from a predefined list of KEGG compound IDs, and focus on one organism, the mouse.
Given a list of compounds and an organism, we can look for related pathways in a single command:
kegg.comp.ids <- c('C06144', 'C06178', 'C02659') pathways <- kegg.comp.conn$getPathwayIds(kegg.comp.ids, 'mmu') pathways
With another function we can get the pathways found for each compound:
path.per.comp <- kegg.comp.conn$getPathwayIdsPerCompound(kegg.comp.ids, 'mmu') fct <- function(i) { if (i %in% names(path.per.comp)) length(path.per.comp[[i]]) else 0 } nb_mmu_gene_pathways <- vapply(kegg.comp.ids, fct, FUN.VALUE=0) names(nb_mmu_gene_pathways) <- kegg.comp.ids
Here, in the final table, we list the number of pathways for each KEGG compound:
nb_mmu_gene_pathways
To build a pathway graph, we need a connector to the KEGG Pathway database:
kegg.path.conn <- mybiodb$getFactory()$getConn('kegg.pathway')
Building list of edges and vertices for pathways is done by calling buildPathwayGraph():
kegg.path.conn$buildPathwayGraph(pathways[[1]])
The object returned is a list whose names are the pathway IDs submitted, and the values are lists containing two data frames (edges and vertices).
We can also get an igraph object for the a pathway (or a list of pathways):
ig <- kegg.path.conn$getPathwayIgraph(pathways[[1]])
And we plot it:
vert <- igraph::as_data_frame(ig, 'vertices') shapes <- vapply(vert[['type']], function(x) if (x == 'reaction') 'rectangle' else 'circle', FUN.VALUE='', USE.NAMES=FALSE) colors <- vapply(vert[['type']], function(x) if (x == 'reaction') 'yellow' else 'red', FUN.VALUE='', USE.NAMES=FALSE) plot(ig, vertex.shape=shapes, vertex.color=colors, vertex.label.dist=1, vertex.size=4, vertex.size2=4)
We will now use a KEGG pathway picture and highlight some of the enzymes and compounds on it.
For this, we first get the enzymes related to the compounds:
kegg.enz.ids <- mybiodb$entryIdsToSingleFieldValues(kegg.comp.ids, db='kegg.compound', field='kegg.enzyme.id') kegg.enz.ids
define the colors we want to apply:
color2ids <- list(yellow=kegg.enz.ids, red=kegg.comp.ids)
Then we call the method that builds the highlighted image and print it:
kegg.path.conn$getDecoratedGraphPicture(pathways[[1]], color2ids=color2ids)
When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):
mybiodb$terminate()
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.