View source: R/createOrganismPackage.R
makeOrganismDbFromBiomart | R Documentation |
The makeOrganismDbFromBiomart
function allows the user
to make a OrganismDb object from transcript annotations
available on a BioMart database. This object has all the benefits of
a TxDb, plus an associated OrgDb and GODb object.
makeOrganismDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL",
dataset="hsapiens_gene_ensembl",
transcript_ids=NULL,
circ_seqs=NULL,
filter="",
id_prefix="ensembl_",
host="https://www.ensembl.org",
port,
miRBaseBuild=NA,
keytype = "ENSEMBL",
orgdb = NA)
biomart |
which BioMart database to use.
Get the list of all available BioMart databases with the
|
dataset |
which dataset from BioMart. For example:
|
transcript_ids |
optionally, only retrieve transcript annotation data for the specified set of transcript ids. If this is used, then the meta information displayed for the resulting TxDb object will say 'Full dataset: no'. Otherwise it will say 'Full dataset: yes'. This TxDb object will be embedded in the resulting OrganismDb object. |
circ_seqs |
a character vector to list out which chromosomes should be marked as circular. |
filter |
Additional filters to use in the BioMart query. Must be
a named list. An example is |
host |
The host URL of the BioMart. Defaults to www.ensembl.org. |
port |
Deprecated: The port to use in the HTTP communication with the host. |
id_prefix |
Specifies the prefix used in BioMart attributes. For
example, some BioMarts may have an attribute specified as
|
miRBaseBuild |
specify the string for the appropriate build
Information from mirbase.db to use for microRNAs. This can be
learned by calling |
keytype |
This indicates the kind of key that this database will use as a foreign key between it's TxDb object and it's OrgDb object. So basically whatever the column name is for the foreign key from your OrgDb that your TxDb will need to map it's GENEID on to. By default it is "ENSEMBL" since the GENEID's for most biomaRt based TxDbs will be ensembl gene ids and therefore they will need to map to ENSEMBL gene mappings from the associated OrgDb object. |
orgdb |
By default, |
makeOrganismDbFromBiomart
is a convenience function that feeds
data from a BioMart database to the lower level
OrganismDb
constructor.
See ?makeOrganismDbFromUCSC
for a similar function
that feeds data from the UCSC source.
The listMarts
function from the biomaRt package can be
used to list all public BioMart databases.
Not all databases returned by this function contain datasets that
are compatible with (i.e. understood by) makeOrganismDbFromBiomart
.
Here is a list of datasets known to be compatible (updated on Sep 24, 2014):
All the datasets in the main Ensembl database:
use biomart="ensembl"
.
All the datasets in the Ensembl Fungi database:
use biomart="fungi_mart_XX"
where XX is the release
version of the database e.g. "fungi_mart_22"
.
All the datasets in the Ensembl Metazoa database:
use biomart="metazoa_mart_XX"
where XX is the release
version of the database e.g. "metazoa_mart_22"
.
All the datasets in the Ensembl Plants database:
use biomart="plants_mart_XX"
where XX is the release
version of the database e.g. "plants_mart_22"
.
All the datasets in the Ensembl Protists database:
use biomart="protists_mart_XX"
where XX is the release
version of the database e.g. "protists_mart_22"
.
All the datasets in the Gramene Mart:
use biomart="ENSEMBL_MART_PLANT"
.
Not all these datasets have CDS information.
A OrganismDb object.
M. Carlson
makeOrganismDbFromUCSC
for convenient ways to make a
OrganismDb object from UCSC online resources.
The listMarts
, useMart
,
and listDatasets
functions in the
biomaRt package.
The supportedMiRBaseBuildValues
function for
listing all the possible values for the miRBaseBuild
argument.
The OrganismDb class.
## Discover which datasets are available in the "ensembl" BioMart
## database:
library(biomaRt)
mart <- useEnsembl("ensembl")
datasets <- listDatasets(mart)
head(datasets)
## Retrieving an incomplete transcript dataset for Human from the
## "ensembl" BioMart database:
transcript_ids <- c(
"ENST00000013894",
"ENST00000268655",
"ENST00000313243",
"ENST00000435657",
"ENST00000384428",
"ENST00000478783"
)
odb <- makeOrganismDbFromBiomart(transcript_ids=transcript_ids)
odb # note that these annotations match the GRCh38 genome assembly
if (interactive()) {
## Now what if we want to use another mirror? We might make use of the
## new host argument. But wait! If we use biomaRt, we can see that
## this host has named the mart differently!
listMarts(host="https://useast.ensembl.org")
## Therefore we must also change the name passed into the "mart"
## argument thusly:
makeOrganismDbFromBiomart(
biomart="ENSEMBL_MART_ENSEMBL",
transcript_ids=transcript_ids,
host="https://useast.ensembl.org"
)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.