library(BiocStyle) require(knitr) opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
The r Biocpkg("IndexedRelations")
package implements the IndexedRelations
class for representing 'omics relationships.
This can be used for physical interactions, regulatory relationships or other associations involving any number of features.
The IndexedRelations
class is primarily intended as a base class from which concrete subclasses can be derived for specific contexts.
For example, the r Biocpkg("GenomicInteractions")
package derives a subclass to represent physical interactions between genomic intervals.
IndexedRelations
classEach IndexedRelations
object holds relationships between "partner" features.
Partners can be any instance of a Vector
-like class such as GenomicRanges
and IRanges
^[See r Biocpkg("S4Vectors")
for details.].
In a single IndexedRelations
object, each entry is a relationship that involves the same number and type of partners.
To illustrate, let's make up a GenomicRanges
and an IRanges
object.
The former might represent a chromosomal interval while the latter might represent, say, an interval on an exogenous sequence like a plasmid.
# Making up features. library(GenomicRanges) gr <- GRanges(sample(1:21, 100, replace=TRUE), IRanges(sample(1e8, 100, replace=TRUE), width=10)) gr ir <- IRanges(sample(1000, 20, replace=TRUE), width=5) ir
Now, assume that there are multiple relationships between individual entries of these two objects.
(Let's pretend that the parts of the plasmid come into contact with the chromosomes at various points.)
The partner.G
vector refers to elements of gr
while partner.I
refers to elements of ir
.
The first element of partner.G
is in a relationship with the first element of partner.I
, and so on.
# Making up relationships. partner.G <- sample(length(gr), 10000, replace=TRUE) partner.I <- sample(length(ir), 10000, replace=TRUE) # Parallel entries across vectors are in a relationship: gr[partner.G[1]] ir[partner.I[1]]
It is then straightforward to construct an IndexedRelations
object.
While this example only uses two partnering features, the class can support any number of partners in a single object.
# Easy but inefficient: library(IndexedRelations) rel <- IndexedRelations(list(gr[partner.G], ir[partner.I])) # More work but more efficient: rel <- IndexedRelations(list(partner.G, partner.I), list(gr, ir)) rel
The key feature of the IndexedRelations
class is that it does not actually store the partnering features explicitly.
Rather, the relationships are internally represented by indices that point to sets of features.
This avoids redundant representation of the same features when large numbers of relationships are to be stored,
reducing memory use and improving the efficiency of algorithms that operate on these relationships.
The partnerFeatures()
and partners()
methods are used to extract the partners.
The former will extract the partners as a full set of features, which is convenient but less efficient.
The latter will only extract the indices, and it is up to the user to use them to index the relevant feature set.
# Extract partnering features: partnerFeatures(rel, 1) partnerFeatures(rel, 2) # Extract partnering indices only: partners(rel)
The featureSets()
method is used to extract the feature sets.
The first feature set contains the features for the first partner,
the second set for the second partner and so on.
featureSets(rel)[[1]] featureSets(rel)[[2]]
The IndexedRelations
object behaves like a vector.
It has length
, names
and can be subsetted and combined.
length(rel) rel[1:10,] c(rel, rel)
You can also modify the partners or the feature sets on an existing IndexedRelations
object.
This is most obviously done by passing relevant features to the replacement methods:
# Set partner with features. partnerFeatures(rel, 2) <- ir[sample(length(ir), length(rel), replace=TRUE)] rel # Modify the first feature set. featureSets(rel)[[1]] <- resize(featureSets(rel)[[1]], 20) rel
Advanced users can also modify the indices directly - though, as anyone who has played with pointers can attest, it is important that the indices point to valid entries in the corresponding feature set!
Npossibles <- length(featureSets(rel)[[2]]) partners(rel)[,2] <- sample(Npossibles, length(rel), replace=TRUE)
In most cases, an IndexedRelations
object will behave "as if" it were an object containing the partnering features explicitly.
Users do not have to worry about the specifics of the index-based representation unless they are specifically manipulating it.
Users can compare different IndexedRelations
objects with the same feature classes.
One relationship is considered "less than" or "greater than" another based on the first non-equal partner and the definitions of inequality for that partner's feature class.
scrambled <- sample(rel) summary(scrambled < rel)
Relationships can be sorted based on the ordering of the partner features. Specifically, the order of the relationships is defined based on the ordering of the first partner; if those are equal, the second partner; and so on.
sort(rel)
It is also possible to match relationships across different IndexedRelations
objects.
Note that all of these methods are agnostic to the identities of the underlying feature sets,
as long as the same classes of features are used in the same order across different objects.
m <- match(rel, scrambled) head(m)
The IndexedRelations
object can be converted to and from some similar classes.
The most obvious of these is the Pairs
class from r Biocpkg("S4Vectors")
^[Of course, this only works if the IndexedRelations
object has two partners!]:
makePairsFromIndexedRelations(rel) p <- Pairs(gr[1:10], ir[1:10]) as(p, "IndexedRelations")
Another option is to convert to and from a DataFrame
object.
This provides a quick and general method to "realize" the indices into the partnering features.
as(rel, "DataFrame")
The rearrangePartners()
function can reorder, drop or duplicate partners:
# Duplicate partner rearrangePartners(rel, c(1,1,2)) # Swap partners rearrangePartners(rel, c(2,1)) # Drop partner rearrangePartners(rel, 2)
Developers may find standardizeFeatureSets
useful to synchronize feature sets across IndexedRelations
instances.
This allows downstream procedures to compare integer indices directly for greater efficiency.
# Setting up an alternative object: gr2 <- GRanges(sample(1:21, 50, replace=TRUE), IRanges(sample(1e8, 50, replace=TRUE), width=10)) ir2 <- IRanges(sample(1000, 50, replace=TRUE), width=5) rel2 <- IndexedRelations(list(gr2, ir2)) identical(featureSets(rel), featureSets(rel2)) out <- standardizeFeatureSets(rel, list(rel2)) identical(featureSets(out$x), featureSets(out$objects[[1]])) # ... though the partnering features are still the same. stopifnot(all(out$x==rel))
The cleanFeatureSets()
function will remove redundant features and sort each feature set in an IndexedRelations
instance.
This enables direct comparison of indices across relationships for a given partner, e.g., for sorting.
rel3 <- cleanFeatureSets(rel) featureSets(rel3)[[2]]
The dropUnusedFeatures()
function will discard features in each set that are not used,
much like droplevels()
does for levels of factors.
This is useful for saving memory prior to serialization to file but is generally unnecessary for use within an R session -
see ?dropUnusedFeatures
for some commentary on this matter.
rel4 <- dropUnusedFeatures(rel)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.