mirtarbase
in the AnnotationDbi
frameworktitle: "mirtarbase: a database of validated miRNA target gene interactions" graphics: yes author: - name: Johannes Rainer output: BiocStyle::html_document: toc_depth: 2 vignette: > %\VignetteIndexEntry{mirtarbase: a database of validated miRNA target gene interactions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteDepends{mirtarbase} %\VignettePackage{mirtarbase} %\VignetteKeywords{annotation,database,miRNA} bibliography: references.bib csl: biomed-central.csl references: - id: dummy title: no title author: - family: noname given: noname
BiocStyle::markdown()
The mirtarbase
package provides the experimentally validated miRNA-target gene
interactions (MTIs) defined in the miRTarbase database [@Hsu:2014co].
The database, which was build based on the Excel spread sheet that can be
downloaded from the miRTarbase main site http://mirtarbase.mbc.nctu.edu.tw/, is
automatically loaded upon library startup and bound to the environment variable
MirtarbaseDb.v<release>
, where <release>
stands for the release of the
miRTarBase. In addition, a shortcut mirtarbase
is automatically generated and
linked to the (most recent) release available in the package.
library(mirtarbase)
After the package is loaded the database can be accessed through the
MirtarbaseDb
object bound to the mirtarbase
variable.
## print some information for the package
mirtarbase
The miRTarbase defines a MTI for each mature miRNA - target gene pair and provides at least one publication in which this interaction was experimentally verified along with the support type (i.e. one of four evidence grades that was defined by the developers), the experiments supporting the interaction and the Pubmed ID. This information is stored internally in a SQLite database (see Section 3 for the layout of the database and the attributes/columns).
Before we start with some usage scenarios for the package it is important to
understand the mechanism to fetch specific values from the database, i.e. how
the results can be filtered. The package uses the same filtering system than the
ensembldb
Bioconductor package and extends it by some additional filters
(filters are partially imported from the mirhostgenes
package).
The following filter classes are used by the package (in alphabetical order):
EntrezFilter
: filter the results based on the target genes' NCBI
Entrezgene identifiers.ExperimentFilter
: restrict the results to specific experiments (use
listExperiments(mirtarbase)
to get a list of all experiments).GenenameFilter
: filter the results based on the target genes' names
(symbols).MatMirnaFilter
: search for MTIs for specific mature miRNA names
(e.g. hsa-miR-15b-5p).MarMirnaIdFilter
: search for MTIs for specific mature miRNAs identified by
their mirbase accession number.MirfamFilter
: filter the results based on miRNAs of specific miRNA families.MirfamIdFilter
: filter the results based on miRNAs of specific miRNA
families identified by their mirbase accession number.MirtarbaseIdFilter
; filter the results based on the ID of the miRTarBase
database.PreMirnaFilter
: filter the results for MTIs of mature miRNAs derived from
the specified pre-miRNA(s).PreMirnaidFilter
: filter the results for MTIs of mature miRNAs derived from
the specified pre-miRNA(s) identified by their mirbase accession number.PublicationFilter
: filter the results based on the publication (identified
by their Pubmed IDs) in which the MTI was described.SpeciesFilter
: filter the results for target genes or miRNAs of the
specified species (use listSpecies(mirtarbase, "gene")
or
listSpecies(mirtarbase, "mirna")
).SupportTypeFilter
: filter the results based on the evidence/support type of
the MTI (use listSupportTypes(mirtarbase)
to get a list of allowed values
for support type).These filters can be used individually or can be combined to generate more
specific queries. Furthermore, the parameter condition
allows some more
flexibility to choose which entries should be fetched. condition
can take the
following values: =, !
, contains, startsWith, endsWith with the latter 3
allowing partial matching. In addition, each filter can take a single or
multiple values (see examples in the next section).
As an example we want to retrieve all MTIs for the gene BCL2, i.e. we want to
get all (mature) miRNAs that have been shown to target this gene. To this end we
define a GenenameFilter
with the value BCL2
and submit this to the mtis
call. As a result we get a MTIList
, which is essentially a list
of MTI
objects that describe the interaction. Each interaction is defined by the mature
miRNA name and the name of the target gene (accessible through the matmirna
and gene
methods, respectively) as well as the collection of evidences for the
interaction. These evidences, or rather publications, are accessible through the
reports
method and specify by which experimental methods the interaction was
validated and in which publication this interaction has been described. The
curators of the miRTarbase manually assigned each evidence one of four support
types which is accessible through the supportedBy
method.
The code below simply fetches all MTIs for the gene BCL2 from the database.
## Query the database to fetch all MTIs for the target gene BCL2 BCL2 <- mtis(mirtarbase, filter=list(GenenameFilter("BCL2"))) BCL2 ## To print some more information on a single MTI BCL2[[1]] ## How many interactions did we get? length(BCL2) ## These are however of all species as we did not specify a species filter ## and miRTarBase lists interactions for all species. sort(table(mirnaSpecies(BCL2)), decreasing=TRUE)
In order to restrict the MTIs to human genes and human miRNAs it is advisable to
add one or more SpeciesFilter
to the query.
## We can use the listSpecies method to get the names of all supported species ## from the database: sort(listSpecies(mirtarbase)) ## We want to get all human mature miRNAs that target human gene BCL2 BCL2 <- mtis(mirtarbase, filter=list(GenenameFilter("BCL2"), SpeciesFilter("Homo sapiens", feature="gene"), SpeciesFilter("Homo sapiens", feature="mirna"))) ## Now we have only human miRNAs. We can now make a table of the miRNA, ## the support type and the number of publications for each MTI BCL2.df <- data.frame(miRNA=matmirna(BCL2), reports=reportCount(BCL2), support_type=unlist(lapply(supportedBy(BCL2), function(z){ return(paste(unique(z), collapse=";")) }))) ## Display the MTIs described by the most publications head(BCL2.df[order(BCL2.df$reports, decreasing=TRUE), ])
So, there is evidence that e.g. miR-16-5p is targeting the gene BCL2, along with miR-15a-5p. We can also enrich this table with the information of the pre-miRNA(s) in which the mature miRNA is encoded. In addition, we can group the miRNAs also by the miRNA family. Note that each mature miRNA can be eventually encoded in more than one pre-miRNA, each mature miRNA (and each pre-miRNA) is supposed to be part of one miRNA family.
BCL2.df <- cbind(BCL2.df, premirna=unlist(lapply(BCL2, function(z){ return(paste(premirna(z), collapse=";")) })), mirfam=mirfam(BCL2)) ## Note: there are some mature miRNAs that can not be mapped to pre-miRNA ## or mirfam names. sum(is.na(as.character(BCL2.df$mirfam))) ## the miRNA with most evidences (miR-16-5p) is actually encoded in two ## precursors: premirna(BCL2$MIRT001800) ## The miRNA families from which most miRNAs target BCL2 are listed below: sort(table(as.character(BCL2.df$mirfam)), decreasing=TRUE) ## The miRNAs from the mir-15 family targeting BCL2 are MTI.mir15 <- BCL2[ which(unlist(lapply(BCL2, mirfam))=="mir-15") ] ## the mature miRNAs from this family: MTI.mir15 ## Extract the mature miRNA IDs matmirna(MTI.mir15) ## And the pre-miRNAs: premirna(MTI.mir15)
The missing mapping of mature miRNAs to pre-miRNA names or mirfam identifiers
observed above is in many instances caused by different mirbase versions on
which the mirbase.db
package and the miRTarbase bases. In addition, not all
mature miRNAs are annotated to miRNA families.
As we have seen above, we can use the methods matmirna
, premirna
and
mirfam
on MTI
or MTIList
objects to retrieve the mature miRNA involved in
the miRNA-target gene interaction, the pre-miRNA in which the mature miRNA is
encoded and the miRNA family to which the pre-miRNA(s) belong.
Next we retrieve MTIs between miRNAs of the mir-15 family and genes which names
start with BCL2. For this we define a GenenameFilter
with "like"
as
condition and a pattern for the gene name.
## Get all miRNA-target gene interactions betwee mature miRNAs from the ## mir-15 family and genes starting with BCL2 BCLs <- mtis(mirtarbase, filter=list(MirfamFilter("mir-15"), GenenameFilter("BCL2", condition="startsWith"), SpeciesFilter("Homo sapiens")) ) BCLs
According to this information the miRNA miR-195-5p targets both, a pro- and an anti-apoptotic member of the BCL2 gene family (BCL2L11 and BCL2, respectively).
By default, the results are returned by the mtis
method as MTIList
object,
but we could also specify "data.frame"
as the return.type
to retrieve the
data as data.frame
. This allows to retrieve only specific information from the
database by specifying the columns that should be returned.
onlyGeneNames <- mtis(mirtarbase, filter=list(MirfamFilter("mir-15"), GenenameFilter("BCL2", condition="startsWith"), SpeciesFilter("Homo sapiens")), columns=c("mirna", "target_gene"), return.type="data.frame") head(onlyGeneNames)
Also members of the mir-17 family have been reported to target genes from the BCL2 gene family [@Ventura:2008gk], thus we retrieve next all MTIs between miRNAs of the miRNA families mir-15 or mir-17 and some of the genes from the BCL2 gene family, a gene family involved in, and regulating, the intrinsic apoptotic pathway.
To retrieve values for more than one gene, respectively miRNA family, we can submit a character vector of the respective ids to the filters.
## retrieving all MTIs between miRNAs from the mir-15 and mir-17 families ## and some genes from the BCL2 gene family BCLs <- mtis(mirtarbase, filter=list(MirfamFilter(c("mir-15", "mir-17")), GenenameFilter(c("BCL2", "BCL2L11", "PMAIP1", "MCL1")), SpeciesFilter("Homo sapiens")) ) BCLs ## the miRNA - gene pairs: data.frame(miRNA=matmirna(BCLs), gene=gene(BCLs), report_count=reportCount(BCLs))
Apparently, miRNAs from both the miR-15 and the miR-17 family target genes of the BCL2 gene family and are thus also involved in the regulation of the apoptotic pathway.
Next we evaluate the evidence grades of the interaction and remove all MTIs that are not of the Functional MTI support type (the type with the highest evidence grade).
funcMti <- unlist(lapply(BCLs, function(z){ return(any(supportedBy(z)=="Functional MTI")) })) sum(funcMti) length(funcMti) ## We could now use this logical vector to sub-set the list. ## Alternatively, we can also re-perform the query and fetch only interactions of that ## support type, which has the advantage that also only the publications of the ## corresponding support type are loaded. BCLs <- mtis(mirtarbase, filter=list(MirfamFilter(c("mir-15", "mir-17")), GenenameFilter(c("BCL2", "BCL2L11", "PMAIP1", "MCL1")), SpeciesFilter("Homo sapiens"), SupportTypeFilter("Functional MTI")) ) ## the miRNA - gene pairs: data.frame(miRNA=matmirna(BCLs), gene=gene(BCLs), report_count=reportCount(BCLs) )
This considerably reduced the list of interactions and also decreased the number of reports per MTI.
Sometimes it might be useful to group the miRNA-target gene interactions by some
factor, e.g. by genes or miRNAs. The method mtisBy
allows to fetch MTIs
grouped by any column from the database. It is possible to group the results
by gene, (mature miRNA), entrezid, support type, Pubmed ID, pre-miRNA name,
miRFam name or by species. The result will be a list
with the names being the
factor by which the interactions are grouped and each element being a MTIList
of the MTIs.
In the example below we fetch all MTIs for the genes BCL2, BCL2L11, MCL1 and group them by miRNA family.
Filters <- list(SpeciesFilter(c("Homo sapiens")), GenenameFilter(c("BCL2", "BCL2L11", "MCL1"))) BCL2by <- mtisBy(mirtarbase, filter=Filters, by="mirfam") head(BCL2by)
In a similar way we can also fetch the data grouped by gene.
BCL2by <- mtisBy(mirtarbase, filter=Filters, by="gene") BCL2by
By default, the mtis
method returns a list of MTI
objects (MTIList
) which
is sufficient for most use cases. Alternatively, however, the mtis
method can
also return the results as a data.frame
. In addition to a significant
performance improvement this also enables to select only specific columns
from the database. Note however that by default the method returns all
columns from the database which results in a data.frame
with one
MTI-publication per row, i.e. the same MTI represented by the miRNA-gene pair
can be present in many rows of this data.frame
depending in how many
publications this interaction was identified.
## We perform the same call as above, but restrict the information to some selected ## columns and specify to return the results as a data.frame rather than a list ## of MTI objects. BCLs.df <- mtis(mirtarbase, filter=list(MirfamFilter(c("mir-15", "mir-17")), GenenameFilter(c("BCL2", "BCL2L11", "PMAIP1", "MCL1")), SpeciesFilter("Homo sapiens"), SupportTypeFilter("Functional MTI")), columns=c("mirna", "target_gene"), return.type="data.frame") BCLs.df
The mirtarbase
package provides also methods and functions that allow to map
mature miRNAs to their precursors or to miRNA families. These functions are
essentially wrapper functions that use the information of the mirbase.db
Bioconductor package for the conversion. However, since the mirtarbase
and
mirbase.db
functions might provide information from different releases, some
of the mappings might not be available. For a complete list of conversion
function refer to the help page of the e.g. premirna2matmirna
function.
## map from pre-miRNA name to mature miRNA name. The function returns by default ## a data.frame premirna2matmirna(c("hsa-mir-16-1", "hsa-mir-16-2")) ## the same information but as a list: premirna2matmirna(c("hsa-mir-16-1", "hsa-mir-16-2"), return.type="list")
mirtarbase
in the AnnotationDbi
frameworkThe mirtarbase
package implements also methods keys
, keytypes
, columns
and select
from the AnnotationDbi
package that allow to query data from a
MirtarbaseDb
analogously to other AnnotationDbi
objects. The supported
columns by these methods are:
## List all supported columns that can be queried. columns(mirtarbase) ## Note that these column names are different to those supported ## by the mtis method: listColumns(mirtarbase)
We can use the keys
and keytypes
methods to retrieve the supported keytypes
for the select
method.
## List all supported keytypes keytypes(mirtarbase) ## List all keys for "Support Type" keys(mirtarbase, keytype="SUPPORTTYPE") ## Use select to retrieve all MTIs for support type "Functional MTI" mtis <- select(mirtarbase, keys="Functional MTI", keytype="SUPPORTTYPE") head(mtis) nrow(mtis)
The select
method for MirtarbaseDb
allows in addition also to submit one or
more filter objects with argument keys
. This enables more flexible queries
than possible with the standard usage. Below we retrieve all MTIs of support
type Functional MTI for genes BCL2 and BCL2L11.
mtis <- select(mirtarbase, keys=list(SupportTypeFilter("Functional MTI"), GenenameFilter(c("BCL2", "BCL2L11")))) head(mtis) nrow(mtis)
The database consists of 3 tables, mirtarbase
which contains all information
stored in the xls file from the miRTarBase web site, pubmed_corpus
, that
contains the content of the MTI-PubMed_corpus.txt file from the miRTarBase web
site and metadata
with some internal informations. The column names and their
properties are listed below. Each line in the table represents the MTI for a
miRNA and one of its target genes as reported in a publication. Thus, an
interaction between a miRNA and its target gene can be listed in more than one
row, depending on the number of publications it was validated.
mirtarbase:
mirtarbase_id
: identifier for the miRNA target gene interaction (MTI). Note
that this ID is not unique, i.e. MTIs reported in several publications have
the same ID but are listed in several rows of the table.mirna
: mature miRNA name (a.k.a miRNA ID, e.g. hsa-miR-20a-5p).species_mirna
: the species of the miRNA (e.g. Homo sapiens).target_gene
: the official gene name (symbol) for the gene (e.g. DUSP6, or
ush).target_gene_entrez_gene_id
: the NCBI Entrezgene ID for the target gene;
either NA
or the (numerical) Entrezgene ID. Contains only unique values, no
multiple IDs collapsed by any separator.species_target_gene
: the species of the target gene.experiments
: the experiments providing the evidence for the interaction as
reported in one publication.support_type
: the different types of support (from weak to strong).references_pmid
: the Pubmed ID of the publication reporting the MTI. Each
line with a single Pubmed ID, no empty (NA
) values.pubmed_corpus
pmid
: the PubMed ID of the paper describing the MTI.key
: the key of the entry (can be Abstract, Title,
Experiment_method, miRNA or Target_gene).value
: the value for the key.metadata
name
: the names of the keys.value
: the value for the key.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.