knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE) library(BiocStyle) set.seed(0)
The BumpyMatrix
class is a two-dimensional object where each entry contains a non-scalar object of constant type/class but variable length.
This can be considered to be raggedness in the third dimension, i.e., "bumpiness".
The BumpyMatrix
is intended to represent complex data that has zero-to-many mappings between individual data points and each feature/sample,
allowing us to store it in Bioconductor's standard 2-dimensional containers such as the SummarizedExperiment
.
One example could be to store transcript coordinates for highly multiplexed FISH data;
the dimensions of the BumpyMatrix
can represent genes and cells while each entry is a data frame with the relevant x/y coordinates.
A variety of BumpyMatrix
subclasses are implemented but the most interesting is probably the BumpyDataFrameMatrix
.
This is an S4 matrix class where each entry is a DataFrame
object, i.e., Bioconductor's wrapper around the data.frame
.
To demonstrate, let's mock up some data for our hypothetical FISH experiment:
library(S4Vectors) df <- DataFrame( x=rnorm(10000), y=rnorm(10000), gene=paste0("GENE_", sample(100, 10000, replace=TRUE)), cell=paste0("CELL_", sample(20, 10000, replace=TRUE)) ) df
We then use the splitAsBumpyMatrix()
utility to easily create our BumpyDataFrameMatrix
based on the variables on the x- and y-axes.
Here, each row is a gene, each column is a cell, and each entry holds all coordinates for that gene/cell combination.
library(BumpyMatrix) mat <- splitAsBumpyMatrix(df[,c("x", "y")], row=df$gene, column=df$cell) mat mat[1,1][[1]]
We can also set sparse=TRUE
to use a more efficient sparse representation, which avoids explicit storage of empty DataFrame
s.
This may be necessary for larger datasets as there is a limit of r .Machine$integer.max
(non-empty) entries in each BumpyMatrix
.
chosen <- df[1:100,] smat <- splitAsBumpyMatrix(chosen[,c("x", "y")], row=chosen$gene, column=chosen$cell, sparse=TRUE) smat
The BumpyMatrix
implements many of the standard matrix operations, e.g., nrow()
, dimnames()
, the combining methods and transposition.
dim(mat) dimnames(mat) rbind(mat, mat) cbind(mat, mat) t(mat)
Subsetting will yield a new BumpyMatrix
object corresponding to the specified submatrix.
If the returned submatrix has a dimension of length 1 and drop=TRUE
, the underlying CompressedList
of values (in this case, the list of DataFrame
s) is returned.
mat[c("GENE_2", "GENE_20"),] mat[,1:5] mat["GENE_10",]
For BumpyDataFrameMatrix
objects, we have an additional third index that allows us to easily extract an individual column of each DataFrame
into a new BumpyMatrix
.
In the example below, we extract the x-coordinate into a new BumpyNumericMatrix
:
out.x <- mat[,,"x"] out.x out.x[,1]
Common arithmetic and logical operations are already implemented for BumpyNumericMatrix
subclasses.
Almost all of these operations will act on each entry of the input object (or corresponding entries, for multiple inputs)
and produce a new BumpyMatrix
of the appropriate type.
pos <- out.x > 0 pos[,1] shift <- 10 * out.x + 1 shift[,1] out.y <- mat[,,"y"] greater <- out.x < out.y greater[,1] diff <- out.y - out.x diff[,1]
When subsetting a BumpyMatrix
, we can use another BumpyMatrix
containing indexing information for each entry.
Consider the following code chunk:
i <- mat[,,'x'] > 0 & mat[,,'y'] > 0 i i[,1] sub <- mat[i] sub sub[,1]
Here, i
is a BumpyLogicalMatrix
where each entry is a logical vector.
When we do x[i]
, we effectively loop over the corresponding entries of x
and i
, using the latter to subset the DataFrame
in the former.
This produces a new BumpyDataFrameMatrix
containing, in this case, only the observations with positive x- and y-coordinates.
For BumpyDataFrameMatrix
objects, subsetting to a single field in the third dimension will automatically drop to the type of the underlying column of the DataFrame
.
This can be stopped with drop=FALSE
to preserve the BumpyDataFrameMatrix
output:
mat[,,'x'] mat[,,'x',drop=FALSE]
In situations where we want to drop the third dimension but not the first two dimensions (or vice versa), we use the .dropk
argument.
Setting .dropk=FALSE
will ensure that the third dimension is not dropped, as shown below:
mat[1,1,'x'] mat[1,1,'x',.dropk=FALSE] mat[1,1,'x',drop=FALSE] mat[1,1,'x',.dropk=TRUE,drop=FALSE]
Subset replacement is also supported, which is most useful for operations to modify specific fields:
copy <- mat copy[,,'x'] <- copy[,,'x'] * 20 copy[,1]
Some additional statistical operations are also implemented that will usually produce an ordinary matrix.
Here, each entry corresponds to the statistic computed from the corresponding entry of the BumpyMatrix
.
mean(out.x)[1:5,1:5] # matrix var(out.x)[1:5,1:5] # matrix
The exception is with operations that naturally produce a vector, in which case a matching 3-dimensional array is returned:
quantile(out.x)[1:5,1:5,] range(out.x)[1:5,1:5,]
Other operations may return another BumpyMatrix
if the output length is variable:
pmax(out.x, out.y)
BumpyCharacterMatrix
objects also have their own methods for grep()
, tolower()
, etc. to manipulate the strings in a convenient manner.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.