Introduction

biodbHmdb is a biodb extension package that implements a connector to HMDB Metabolites.

We present here the different ways to search for HMDB [@wishart2013_HMDB] entries with this package.

Note that the whole HMDB is downloaded locally by biodb and stored on disk, since this is the only way to access HMDB programmatically. Any search on HMDB is hence currently run on the local machine.

Installation

Install using Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install('biodbHmdb')

Initialization

The first step in using biodbHmdb, is to create an instance of the biodb class BiodbMain from the main biodb package. This is done by calling the constructor of the class:

mybiodb <- biodb::newInst()

During this step the configuration is set up, the cache system is initialized and extension packages are loaded.

We will see at the end of this vignette that the biodb instance needs to be terminated with a call to the terminate() method.

Creating a connector to HMDB Metabolites

In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbHmdb implements a connector to a remote database. Here is the code to instantiate a connector:

conn <- mybiodb$getFactory()$createConn('hmdb.metabolites')

For this vignette, we will avoid the downloading of the full HMDB Metabolites database, and use instead an extract containing a few entries:

dbExtract <- system.file("extdata", 'generated', "hmdb_extract.zip",
    package="biodbHmdb")
conn$setPropValSlot('urls', 'db.zip.url', dbExtract)

Accessing entries

To get the number of entries stored inside the database, run:

conn$getNbEntries()

To get some of the first entry IDs (accession numbers) from the database, run:

ids <- conn$getEntryIds(2)
ids

To retrieve entries, use:

entries <- conn$getEntry(ids)
entries

To convert a list of entries into a dataframe, run:

x <- mybiodb$entriesToDataframe(entries, compute=FALSE)
x

Searching by name

We use here the generic biodb method searchForEntries() to search for entries by name:

id <- conn$searchForEntries(list(name='1-Methylhistidine'), max.results=1)
id

We limit the search result to one entry with the max.results field.

The first parameter is the filtering criterion, expressed as a list whose single key (in our case) is the biodb field on which we want to filter. The value is the text we want to search for. See the documentation of searchForEntries() inside ?biodb::BiodbConn.

We could also use several strings to search for, in which case an entry will be matched if its field value contains all the specified strings:

conn$searchForEntries(list(name=c('propanoic', 'acid')), max.results=1)

To look at the values of the entry, you may convert it to a data frame:

entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name'))

See table \@ref(tab:entryByNameTable) for the content of this data frame.

knitr::kable(entryDf, "pipe", caption="The entry returned when searching by name.")

Searching inside the "description" field

Searching inside the description field can be done in the same way as for the name field. Here is a search with multiple strings to match:

id <- conn$searchForEntries(list(description=c('Parkinson', 'sclerosis')), max.results=1)
id

Again, you can look at the values of the entry through a data frame:

entryDf <- conn$getEntry(id)$getFieldsAsDataframe(fields=c('accession', 'name', 'description'))

See table \@ref(tab:entryByDescTable) for the content of this data frame.

knitr::kable(entryDf, "pipe", caption="The entry returned when searching by description.")

Closing biodb instance

When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):

mybiodb$terminate()

Session information

sessionInfo()

References



pkrog/biodbHmdb documentation built on Jan. 11, 2023, 4:09 a.m.