SparseArray-matrixStats: SparseArray col/row summarization

SparseArray-matrixStatsR Documentation

SparseArray col/row summarization

Description

The SparseArray package provides memory-efficient col/row summarization methods (a.k.a. matrixStats methods) for SparseArray objects, like colSums(), rowSums(), colMedians(), rowMedians(), colVars(), rowVars(), etc...

Note that these are S4 generic functions defined in the MatrixGenerics package, with methods for ordinary matrices defined in the matrixStats package. This man page documents the methods defined for SparseArray objects.

Usage

## N.B.: Showing ONLY the col*() methods (usage of row*() methods is
## the same):

## S4 method for signature 'SparseArray'
colAnyNAs(x, rows=NULL, cols=NULL, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colAnys(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colAlls(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colMins(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colMaxs(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colRanges(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colSums(x, na.rm=FALSE, dims=1)

## S4 method for signature 'SparseArray'
colProds(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colMeans(x, na.rm=FALSE, dims=1)

## S4 method for signature 'SparseArray'
colSums2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colMeans2(x, rows=NULL, cols=NULL, na.rm=FALSE, dims=1, ..., useNames=NA)

## S4 method for signature 'SparseArray'
colVars(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1,
           ..., useNames=NA)

## S4 method for signature 'SparseArray'
colSds(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dims=1,
          ..., useNames=NA)

## S4 method for signature 'SparseArray'
colMedians(x, rows=NULL, cols=NULL, na.rm=FALSE, ..., useNames=NA)

Arguments

x

A SparseMatrix or SparseArray object.

Note that the colMedians() and rowMedians() methods only support 2D objects (i.e. SparseMatrix objects) at the moment.

rows, cols, ...

Not supported.

na.rm, useNames, center

See man pages of the corresponding generics in the MatrixGenerics package (e.g. ?MatrixGenerics::colVars) for a description of these arguments.

Note that, unlike the methods for ordinary matrices defined in the matrixStats package, the center argument of the colVars(), rowVars(), colSds(), and rowSds() methods for SparseArray objects can only be a single value (or a NULL). In particular, if x has more than one column, then center cannot be a vector with one value per column in x.

dims

See ?base::colSums for a description of this argument. Note that all the methods above support it, except colMedians() and rowMedians().

Details

All these methods operate natively on the SVT_SparseArray internal representation, for maximum efficiency.

Note that more col/row summarization methods might be added in the future.

Value

See man pages of the corresponding generics in the MatrixGenerics package (e.g. ?MatrixGenerics::colRanges) for the value returned by these methods.

Note

Most col*() methods for SparseArray objects are multithreaded. See set_SparseArray_nthread for how to control the number of threads.

See Also

  • SparseArray objects.

  • The man pages of the various generic functions defined in the MatrixGenerics package e.g. MatrixGenerics::colVars etc...

Examples

## ---------------------------------------------------------------------
## 2D CASE
## ---------------------------------------------------------------------
m0 <- matrix(0L, nrow=6, ncol=4, dimnames=list(letters[1:6], LETTERS[1:4]))
m0[c(1:2, 8, 10, 15:17, 24)] <- (1:8)*10L
m0["e", "B"] <- NA
svt0 <- SparseArray(m0)
svt0

colSums(svt0)
colSums(svt0, na.rm=TRUE)

rowSums(svt0)
rowSums(svt0, na.rm=TRUE)

colMeans(svt0)
colMeans(svt0, na.rm=TRUE)

colRanges(svt0)
colRanges(svt0, useNames=FALSE)
colRanges(svt0, na.rm=TRUE)
colRanges(svt0, na.rm=TRUE, useNames=FALSE)

colVars(svt0)
colVars(svt0, useNames=FALSE)

## Sanity checks:
stopifnot(
  identical(colSums(svt0), colSums(m0)),
  identical(colSums(svt0, na.rm=TRUE), colSums(m0, na.rm=TRUE)),
  identical(rowSums(svt0), rowSums(m0)),
  identical(rowSums(svt0, na.rm=TRUE), rowSums(m0, na.rm=TRUE)),
  identical(colMeans(svt0), colMeans(m0)),
  identical(colMeans(svt0, na.rm=TRUE), colMeans(m0, na.rm=TRUE)),
  identical(colRanges(svt0), colRanges(m0, useNames=TRUE)),
  identical(colRanges(svt0, useNames=FALSE), colRanges(m0, useNames=FALSE)),
  identical(colRanges(svt0, na.rm=TRUE),
            colRanges(m0, na.rm=TRUE, useNames=TRUE)),
  identical(colVars(svt0), colVars(m0, useNames=TRUE)),
  identical(colVars(svt0, na.rm=TRUE),
            colVars(m0, na.rm=TRUE, useNames=TRUE))
)

## ---------------------------------------------------------------------
## 3D CASE (AND ARBITRARY NUMBER OF DIMENSIONS)
## ---------------------------------------------------------------------
set.seed(2009)
svt <- 6L * (poissonSparseArray(5:3, density=0.35) -
             poissonSparseArray(5:3, density=0.35))
dimnames(svt) <- list(NULL, letters[1:4], LETTERS[1:3])

cs1 <- colSums(svt)
cs1  # cs1[j , k] is equal to sum(svt[ , j, k])

cs2 <- colSums(svt, dims=2)
cs2  # cv2[k] is equal to sum(svt[ , , k])

cv1 <- colVars(svt)
cv1  # cv1[j , k] is equal to var(svt[ , j, k])

cv2 <- colVars(svt, dims=2) 
cv2  # cv2[k] is equal to var(svt[ , , k])

## Sanity checks:
k_idx <- setNames(seq_len(dim(svt)[3]), dimnames(svt)[[3]])
j_idx <- setNames(seq_len(dim(svt)[2]), dimnames(svt)[[2]])
cv1b <- sapply(k_idx, function(k)
                      sapply(j_idx, function(j) var(svt[ , j, k, drop=FALSE])))
cv2b <- sapply(k_idx, function(k) var(svt[ , , k]))
stopifnot(
  identical(colSums(svt), colSums(as.array(svt))),
  identical(colSums(svt, dims=2), colSums(as.array(svt), dims=2)),
  identical(cv1, cv1b),
  identical(cv2, cv2b)
)

Bioconductor/SparseArray documentation built on Oct. 30, 2024, 12:14 p.m.