ccImpute: Impute Dropout Values in Single-Cell RNA Sequencing Data

ccImputeR Documentation

Impute Dropout Values in Single-Cell RNA Sequencing Data

Description

Performs imputation of dropout values in single-cell RNA sequencing (scRNA-seq) data using a consensus clustering-based algorithm (ccImpute). This implementation includes performance enhancements over the original ccImpute method described in the paper "ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data" (DOI: https://doi.org/10.1186/s12859-022-04814-8).

Defines the generic function 'ccImpute' and a specific method for 'SingleCellExperiment' objects.

Usage

ccImpute.SingleCellExperiment(
  object,
  dist,
  nCeil = 2000,
  svdMaxRatio = 0.08,
  maxSets = 8,
  k,
  consMin = 0.75,
  kmNStart,
  kmMax = 1000,
  fastSolver = TRUE,
  BPPARAM = bpparam(),
  verbose = TRUE
)

ccImpute(
  object,
  dist,
  nCeil = 2000,
  svdMaxRatio = 0.08,
  maxSets = 8,
  k,
  consMin = 0.75,
  kmNStart,
  kmMax = 1000,
  fastSolver = TRUE,
  BPPARAM = bpparam(),
  verbose = TRUE
)

## S4 method for signature 'SingleCellExperiment'
ccImpute(
  object,
  dist,
  nCeil = 2000,
  svdMaxRatio = 0.08,
  maxSets = 8,
  k,
  consMin = 0.75,
  kmNStart,
  kmMax = 1000,
  fastSolver = TRUE,
  BPPARAM = bpparam(),
  verbose = TRUE
)

Arguments

object

A SingleCellExperiment class object containing the scRNA-seq data. The logcounts assay should contain matrix with log-normalized expression values. This code supports both dense and sparse (dgCMatrix) matrix format storage.

dist

(Optional) A distance matrix used for cell similarity. calculations. If not provided, a weighted Spearman correlation matrix is calculated.

nCeil

(Optional) The maximum number of cells used to compute the proportion of singular vectors (default: 2000).

svdMaxRatio

(Optional) The maximum proportion of singular vectors used for generating subsets (default: 0.08).

maxSets

(Optional) The maximum number of sub-datasets used for consensus clustering (default: 8).

k

(Optional) The number of clusters (cell groups) in the data. If not provided, it is estimated using the Tracy-Widom Bound.

consMin

(Optional) The low-pass filter threshold for processing the consensus matrix (default: 0.75).

kmNStart

nstart parameter passed to kmeans. function. Can be set manually. By default it is 1000 for up to 2000 cells and 50 for more than 2000 cells.

kmMax

iter.max parameter passed to kmeans.

fastSolver

(Optional) Whether to use mean of non-zero values for calculating dropout values or a linear equation solver (much slower and did show empirical difference in imputation performance) (default: TRUE).

BPPARAM

(Optional) A BiocParallelParam object for parallel processing (default: bpparam()).

verbose

(Optional) Whether to print progress messages (default: TRUE).

Value

A SingleCellExperiment class object with the imputed expression values stored in the '"imputed"' assay.

Examples

library(BiocParallel)
library(splatter)
library(scater)
sce <- splatSimulate(group.prob = rep(1, 5)/5, sparsify = FALSE, 
        batchCells=100, nGenes=1000, method = "groups", verbose = FALSE, 
        dropout.type = "experiment")
sce <- logNormCounts(sce)
cores <- 2
BPPARAM = MulticoreParam(cores)
sce <- ccImpute(sce, BPPARAM=BPPARAM)


khazum/ccImpute documentation built on July 26, 2024, 1:13 a.m.