biodbHmdb is a biodb extension package that implements a connector to HMDB Metabolites.
We present here the different ways to search for HMDB [@wishart2013_HMDB] entries with this package.
Note that the whole HMDB is downloaded locally by biodb and stored on disk, since this is the only way to access HMDB programmatically. Any search on HMDB is hence currently run on the local machine.
Install using Bioconductor:
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install('biodbHmdb')
The first step in using biodbHmdb, is to create an instance of the biodb
class BiodbMain
from the main biodb package. This is done by calling the
constructor of the class:
mybiodb <- biodb::newInst()
During this step the configuration is set up, the cache system is initialized and extension packages are loaded.
We will see at the end of this vignette that the biodb instance needs to be
terminated with a call to the terminate()
method.
In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbHmdb implements a connector to a remote database. Here is the code to instantiate a connector:
conn <- mybiodb$getFactory()$createConn('hmdb.metabolites')
For this vignette, we will avoid the downloading of the full HMDB Metabolites database, and use instead an extract containing a few entries:
dbExtract <- system.file("extdata", 'generated', "hmdb_extract.zip", package="biodbHmdb") conn$setPropValSlot('urls', 'db.zip.url', dbExtract)
To get the number of entries stored inside the database, run:
conn$getNbEntries()
To get some of the first entry IDs (accession numbers) from the database, run:
ids <- conn$getEntryIds(2) ids
To retrieve entries, use:
entries <- conn$getEntry(ids) entries
To convert a list of entries into a dataframe, run:
x <- mybiodb$entriesToDataframe(entries, compute=FALSE) x
We use here the generic biodb method searchForEntries()
to search for
entries by name:
id <- conn$searchForEntries(list(name='1-Methylhistidine'), max.results=1) id
We limit the search result to one entry with the max.results
field.
The first parameter is the filtering criterion, expressed as a list whose
single key (in our case) is the biodb field on which we want to filter.
The value is the text we want to search for.
See the documentation of searchForEntries()
inside ?biodb::BiodbConn
.
We could also use several strings to search for, in which case an entry will be matched if its field value contains all the specified strings:
conn$searchForEntries(list(name=c('propanoic', 'acid')), max.results=1)
To look at the values of the entry, you may convert it to a data frame:
entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name'))
See table \@ref(tab:entryByNameTable) for the content of this data frame.
knitr::kable(entryDf, "pipe", caption="The entry returned when searching by name.")
Searching inside the description
field can be done in the same way as for the
name
field.
Here is a search with multiple strings to match:
id <- conn$searchForEntries(list(description=c('Parkinson', 'sclerosis')), max.results=1) id
Again, you can look at the values of the entry through a data frame:
entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name', 'description'))
See table \@ref(tab:entryByDescTable) for the content of this data frame.
knitr::kable(entryDf, "pipe", caption="The entry returned when searching by description.")
When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):
mybiodb$terminate()
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.