Scry Methods For Larger Datasets
In scry: Small-Count Analysis Methods for High-Dimensional Data

suppressPackageStartupMessages(library(TENxPBMCData))
require(scry)

We illustrate the application of scry methods to disk-based data from the TENxPBMCData package. Each dataset in this package is stored in an HDF5 file that is accessed through a DelayedArray interface. This avoids the need to load the entire dataset into memory for analysis.

Feature Selection with Deviance

sce<-TENxPBMCData(dataset="pbmc3k")
h5counts<-counts(sce)
seed(h5counts) #print information about object
h5counts<-h5counts[rowSums(h5counts)>0,]
system.time(h5devs<-devianceFeatureSelection(h5counts)) # 26 sec

We now compare the computation speed when the same data is converted to an ordinary array in-memory. Note this would not be possible with larger HDF5Array objects.

denseCounts<-as.matrix(h5counts)
system.time(denseDevs<-devianceFeatureSelection(denseCounts)) # 5 sec
max(abs(denseDevs-h5devs)) #should be close to zero

Finally we compare the speed when the counts data are stored in a sparse in-memory Matrix format

mean(denseCounts>0) #shows that the data are mostly zeros so sparsity useful
sparseCounts<-Matrix::Matrix(denseCounts,sparse=TRUE)
system.time(sparseDevs<-devianceFeatureSelection(sparseCounts)) #1.6 sec
max(abs(sparseDevs-h5devs)) #should be close to zero

Using disk-based data saves memory but slows computation time. When the data contain mostly zeros, and are not too large, the sparse in-memory Matrix object achieves fastest computation times. The resulting deviance statistics are the same for all of the different data formats.

Null residuals

One can run nullResiduals on HDF5Matrix, DelayedArray matrices, and sparse matrices from the Matrix package with the same syntax used for the base matrix case.

We illustrate this with the same dataset from the TENxPBMCData package.

sce <- nullResiduals(sce, assay="counts", type="deviance")
str(sce)

Any scripts or data that you put into this service are public.

scry documentation built on Nov. 8, 2020, 5:16 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

scry
Small-Count Analysis Methods for High-Dimensional Data

Scry Methods For Larger Datasets
In scry: Small-Count Analysis Methods for High-Dimensional Data

Feature Selection with Deviance

Null residuals

Try the scry package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

scry Small-Count Analysis Methods for High-Dimensional Data

Scry Methods For Larger Datasets In scry: Small-Count Analysis Methods for High-Dimensional Data

Feature Selection with Deviance

Null residuals

Try the scry package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

scry
Small-Count Analysis Methods for High-Dimensional Data

Scry Methods For Larger Datasets
In scry: Small-Count Analysis Methods for High-Dimensional Data