zinbsurf: Perform dimensionality reduction using a ZINB regression...

zinbsurfR Documentation

Perform dimensionality reduction using a ZINB regression model for large datasets.

Description

Given an object with the data, it performs dimensionality reduction using a ZINB regression model with gene and cell-level covariates on a random subset of the data. It then projects the remaining data onto the lower dimensional space.

Usage

zinbsurf(Y, ...)

## S4 method for signature 'SummarizedExperiment'
zinbsurf(
  Y,
  X,
  V,
  K,
  which_assay,
  which_genes,
  zeroinflation = TRUE,
  prop_fit = 0.1,
  BPPARAM = BiocParallel::bpparam(),
  verbose = FALSE,
  ...
)

Arguments

Y

The data (genes in rows, samples in columns). Currently implemented only for SummarizedExperiment.

...

Additional parameters to describe the model, see zinbModel.

X

The design matrix containing sample-level covariates, one sample per row. If missing, X will contain only an intercept. If Y is a SummarizedExperiment object, X can be a formula using the variables in the colData slot of Y.

V

The design matrix containing gene-level covariates, one gene per row. If missing, V will contain only an intercept. If Y is a SummarizedExperiment object, V can be a formula using the variables in the rowData slot of Y.

K

integer. Number of latent factors. Specify K = 0 if only computing observational weights.

which_assay

numeric or character. Which assay of Y to use. If missing, if 'assayNames(Y)' contains "counts" then that is used. Otherwise, the first assay is used.

which_genes

character. Which genes to use to estimate W (see details). Ignored if fitted_model is provided.

zeroinflation

Whether or not a ZINB model should be fitted. If FALSE, a negative binomial model is fitted instead.

prop_fit

numeric between 0 and 1. The proportion of cells to use for the zinbwave fit.

BPPARAM

object of class bpparamClass that specifies the back-end to be used for computations. See bpparam for details.

verbose

Print helpful messages.

Details

This function implements an approximate strategy, in which the full zinbwave model is fit only on a random subset of the data (controlled by the prop_fit parameter). The rest of the samples are subsequently projected onto the low-rank space. This strategy is much faster and uses less memory than the full zinbwave method. It is recommended with extremely large datasets.

By default zinbsurf uses all genes to estimate W. However, we recommend to use the top 1,000 most variable genes for this step. In general, a user can specify any custom set of genes to be used to estimate W, by specifying either a vector of gene names, or a single character string corresponding to a column of the rowData.

Value

An object of class SingleCellExperiment; the dimensionality reduced matrix is stored in the reducedDims slot.

Methods (by class)

  • zinbsurf(SummarizedExperiment): Y is a SummarizedExperiment.

Examples

se <- SingleCellExperiment(assays = list(counts = matrix(rpois(60, lambda=5),
                                                         nrow=10, ncol=6)),
                           colData = data.frame(bio = gl(2, 3)))
colnames(se) <- paste0("sample", 1:6)
m <- zinbsurf(se, X="~bio", K = 1, prop_fit = .5, which_assay = 1,
              BPPARAM=BiocParallel::SerialParam())

drisso/zinbwave documentation built on March 18, 2024, 5:13 p.m.