The SCArray package provides large-scale single-cell RNA-seq data manipulation
using Genomic Data Structure
(GDS) files. It combines
dense/sparse matrices stored in GDS files and the Bioconductor infrastructure
framework
(SingleCellExperiment
and DelayedArray) to
provide out-of-memory data storage and manipulation using the R programming
language. As shown in the figure, SCArray provides a SingleCellExperiment
object for downstream data analyses. GDS is an alternative to HDF5.
Unlike HDF5, GDS supports the direct storage of a sparse matrix without
converting it to multiple vectors.
Requires R (>= v3.5.0), gdsfmt (>= v1.24.0)
Bioconductor repository
To install this package, start R and enter: ```{R, eval=FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("SCArray")
## Format conversion ### Conversion from SingleCellExperiment The SCArray package can convert a single-cell experiment object (SingleCellExperiment) to a GDS file using the function `scConvGDS()`. For example, ```r suppressPackageStartupMessages(library(SCArray)) suppressPackageStartupMessages(library(SingleCellExperiment)) # load a SingleCellExperiment object fn <- system.file("extdata", "LaMannoBrainSub.rds", package="SCArray") sce <- readRDS(fn) # convert to a GDS file scConvGDS(sce, "test.gds") # list data structure in the GDS file (f <- scOpen("test.gds")); scClose(f)
The input of scConvGDS()
can be a dense or sparse matrix for count data:
library(Matrix) cnt <- matrix(0, nrow=4, ncol=8) set.seed(100); cnt[sample.int(length(cnt), 8)] <- rpois(8, 4) (cnt <- as(cnt, "dgCMatrix")) # convert to a GDS file scConvGDS(cnt, "test.gds")
When a single-cell GDS file is available, users can use scExperiment()
to
load a SingleCellExperiment object from the GDS file. The assay data in the
SingleCellExperiment object are DelayedMatrix objects to avoid the memory limit.
# a GDS file in the SCArray package (fn <- system.file("extdata", "LaMannoBrainData.gds", package="SCArray")) # load a SingleCellExperiment object from the file sce <- scExperiment(fn) sce # it is a DelayedMatrix (the whole matrix is not loaded) assays(sce)$counts # column data colData(sce) # row data rowData(sce)
SCArray provides a SingleCellExperiment
object for downstream data analyses.
At first, we create a log count matrix logcnt
from the count matrix.
Note that logcnt
is also a DelayedMatrix without actually generating the
whole matrix.
cnt <- assays(sce)$counts logcnt <- log2(cnt + 1) assays(sce)$logcounts <- logcnt logcnt
The DelayedMatrixStats package provides functions operating on rows and columns of DelayedMatrix objects. For example, we can calculate the mean for each column or row of the log count matrix.
suppressPackageStartupMessages(library(DelayedMatrixStats)) col_mean <- DelayedMatrixStats::colMeans2(logcnt) str(col_mean) row_mean <- DelayedMatrixStats::rowMeans2(logcnt) str(row_mean)
The scater package can perform the uniform manifold approximation and projection (UMAP) for the cell data, based on the data in a SingleCellExperiment object.
suppressPackageStartupMessages(library(scater)) # run umap analysis sce <- runUMAP(sce)
plotReducedDim()
plots cell-level reduced dimension results (UMAP) stored
in the SingleCellExperiment object:
plotReducedDim(sce, dimred="UMAP")
# print version information about R, the OS and attached or loaded packages sessionInfo()
unlink("test.gds", force=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.