MultiDb-class | R Documentation |
The OrganismDb class is a container for storing knowledge about existing Annotation packages and the relationships between these resources. The purpose of this object and it's associated methods is to provide a means by which users can conveniently query for data from several different annotation resources at the same time using a familiar interface.
The supporting methods select
, columns
and keys
are
used together to extract data from an OrganismDb
object in a manner that should be consistent with how these are used
on the supporting annotation resources.
The family of seqinfo
style getters (seqinfo
,
seqlevels
, seqlengths
, isCircular
, genome
,
and seqnameStyle
) is also supported for OrganismDb objects
provided that the object in question has an embedded TxDb
object.
In the code snippets below, x
is a OrganismDb object.
keytypes(x)
:allows the user to discover which keytypes can be passed in to
select
or keys
and the keytype
argument.
keys(x, keytype, pattern, column, fuzzy)
:Return keys for the database contained in the TxDb object .
The keytype
argument specifies the kind of keys that will
be returned and is always required.
If keys
is used with pattern
, it will pattern match
on the keytype
.
But if the column
argument is also provided along with the
pattern
argument, then pattern
will be matched
against the values in column
instead.
If keys
is called with column
and no pattern
argument, then it will return all keys that have corresponding
values in the column
argument.
Thus, the behavior of keys
all depends on how many arguments are
specified.
Use of the fuzzy
argument will toggle fuzzy matching to
TRUE or FALSE. If pattern
is not used, fuzzy is ignored.
columns(x)
:shows which kinds of data can be returned for the
OrganismDb
object.
select(x, keys, columns, keytype)
:When all the appropriate arguments are specifiedm select
will retrieve the matching data as a data.frame based on
parameters for selected keys
and columns
and
keytype
arguments.
mapIds(x, keys, columns, keytype, ..., multiVals)
:When all the appropriate arguments are specifiedm mapIds
will retrieve the matching data as a vector or list based on
parameters for selected keys
and columns
and
keytype
arguments. The multiVals argument can be used to
choose the format of the values returned. Possible values for
multiVals are:
This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior
This will just returns a list object to the end user
This will remove all elements that contain multiple matches and will therefore return a shorter vector than what came in whenever some of the keys match more than one value
This will return an NA value whenever there are multiple matches
This just returns a SimpleCharacterList object
You can also supply a function to the multiVals
argument for custom behaviors. The function must take a single argument and return a single value. This function will be applied to all the elements and will serve a 'rule' that for which thing to keep when there is more than one element. So for example this example function will always grab the last element in each result: last <- function(x){x[[length(x)]]}
selectByRanges(x, ranges, columns, overlaps,
ignore.strand)
: When all the appropriate arguments are specified,
selectByRanges
will return an annotated GRanges object that
has been generated based on what you passed in to the ranges
argument and whether that overlapped with what you specified in
the overlaps argument. Internally this function will get
annotation features and overlaps by calling the appropriate
annotation methods indicated by the overlaps argument. The value
for overlaps can be any of: gene, tx, exons, cds, 5utr, introns or
3utr. The default value is 'tx' which will return to you, your
annotated ranges based on whether the overlapped with the
transcript ranges of any gene in the associated TxDb object based
on the gene models it contains. Also: the number of ranges
returned to you will match the number of genes that your ranges
argument overlapped for the type of overlap that you specified.
So if some of your ranges are large and overlap several features
then you will get many duplicated ranges returned with one for
each gene that has an overlapping feature. The columns values
that you request will be returned in the mcols for the annotated
GRanges object that is the return value for this function.
Finally, the ignore.strand argument is provided to indicate
whether or not findOverlaps should ignore or respect the strand.
selectRangesById(x, keys, columns, keytype, feature)
: When
all the appropriate arguments are specified,
selectRangesById
will return a GRangesList object that
correspond to gene models GRanges for the keys that you specify
with the keys and keytype arguments. The annotation ranges
retrieved for this will be specified by the feature argument and
can be: gene, tx, exon or cds. The default is 'tx' which will
return the transcript ranges for each gene as a GRanges object in
the list. Extra data can also be returned in the mcols values for
those GRanges by using the columns argument.
resources(x)
: shows where the db files are for resources
that are used to store the data for the OrganismDb
object.
TxDb(x)
: Accessor for the TxDb object of a
OrganismDb
object.
TxDb(x) <- value
: Allows you to swap in an alternative TxDb
for a given OrganismDb
object. This is most often useful
when combined with saveDb(TxDb, file)
, which returns the
saved TxDb, so that you can save a TxDb to disc and then assign
the saved version right into your OrganismDb
object.
Marc Carlson
AnnotationDb-class for more descriptsion
of methods select
,keytypes
,keys
and columns
.
makeOrganismPackage for functions
used to generate an OrganismDb
based package.
rangeBasedAccessors for the range based methods
used in extracting data from a OrganismDb
object.
Topics in the GenomeInfoDb
package:
seqinfo
seqlevels
seqlengths
isCircular
genome
## load a package that creates an OrganismDb
library(Homo.sapiens)
ls(2)
## then the methods can be used on this object.
columns <- columns(Homo.sapiens)[c(7,10,11,12)]
keys <- head(keys(org.Hs.eg.db, "ENTREZID"))
keytype <- "ENTREZID"
res <- select(Homo.sapiens, keys, columns, keytype)
head(res)
res <- mapIds(Homo.sapiens, keys=c('1','10'), column='ALIAS',
keytype='ENTREZID', multiVals="CharacterList")
## get symbols for ranges in question:
ranges <- GRanges(seqnames=Rle(c('chr11'), c(2)),
IRanges(start=c(107899550, 108025550),
end=c(108291889, 108050000)), strand='*',
seqinfo=seqinfo(Homo.sapiens))
selectByRanges(Homo.sapiens, ranges, 'SYMBOL')
## Or extract the gene model for the 'A1BG' gene:
selectRangesById(Homo.sapiens, 'A1BG', keytype='SYMBOL')
## Get the DB connections or DB file paths associated with those for
## each.
dbconn(Homo.sapiens)
dbfile(Homo.sapiens)
## extract the taxonomyId
taxonomyId(Homo.sapiens)
##extract the resources
resources(Homo.sapiens)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.