celda_C: Cell clustering with Celda

celda_CR Documentation

Cell clustering with Celda

Description

Clusters the columns of a count matrix containing single-cell data into K subpopulations. The useAssay assay slot in altExpName altExp slot will be used if it exists. Otherwise, the useAssay assay slot in x will be used if x is a SingleCellExperiment object.

Usage

celda_C(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  sampleLabel = NULL,
  K,
  alpha = 1,
  beta = 1,
  algorithm = c("EM", "Gibbs"),
  stopIter = 10,
  maxIter = 200,
  splitOnIter = 10,
  splitOnLast = TRUE,
  seed = 12345,
  nchains = 3,
  zInitialize = c("split", "random", "predefined"),
  countChecksum = NULL,
  zInit = NULL,
  logfile = NULL,
  verbose = TRUE
)

## S4 method for signature 'SingleCellExperiment'
celda_C(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  sampleLabel = NULL,
  K,
  alpha = 1,
  beta = 1,
  algorithm = c("EM", "Gibbs"),
  stopIter = 10,
  maxIter = 200,
  splitOnIter = 10,
  splitOnLast = TRUE,
  seed = 12345,
  nchains = 3,
  zInitialize = c("split", "random", "predefined"),
  countChecksum = NULL,
  zInit = NULL,
  logfile = NULL,
  verbose = TRUE
)

## S4 method for signature 'ANY'
celda_C(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  sampleLabel = NULL,
  K,
  alpha = 1,
  beta = 1,
  algorithm = c("EM", "Gibbs"),
  stopIter = 10,
  maxIter = 200,
  splitOnIter = 10,
  splitOnLast = TRUE,
  seed = 12345,
  nchains = 3,
  zInitialize = c("split", "random", "predefined"),
  countChecksum = NULL,
  zInit = NULL,
  logfile = NULL,
  verbose = TRUE
)

Arguments

x

A SingleCellExperiment with the matrix located in the assay slot under useAssay. Rows represent features and columns represent cells. Alternatively, any matrix-like object that can be coerced to a sparse matrix of class "dgCMatrix" can be directly used as input. The matrix will automatically be converted to a SingleCellExperiment object.

useAssay

A string specifying the name of the assay slot to use. Default "counts".

altExpName

The name for the altExp slot to use. Default "featureSubset".

sampleLabel

Vector or factor. Denotes the sample label for each cell (column) in the count matrix.

K

Integer. Number of cell populations.

alpha

Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1.

beta

Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature in each cell population. Default 1.

algorithm

String. Algorithm to use for clustering cell subpopulations. One of 'EM' or 'Gibbs'. The EM algorithm is faster, especially for larger numbers of cells. However, more chains may be required to ensure a good solution is found. If 'EM' is selected, then 'stopIter' will be automatically set to 1. Default 'EM'.

stopIter

Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10.

maxIter

Integer. Maximum number of iterations of Gibbs sampling or EM to perform. Default 200.

splitOnIter

Integer. On every 'splitOnIter' iteration, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. To disable splitting, set to -1. Default 10.

splitOnLast

Integer. After 'stopIter' iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then 'stopIter' will be reset. Default TRUE.

seed

Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.

nchains

Integer. Number of random cluster initializations. Default 3.

zInitialize

Character. One of 'random', 'split', or 'predefined'. With 'random', cells are randomly assigned to a populations. With 'split', cells will be split into sqrt(K) populations and then each population will be subsequently split into another sqrt(K) populations. With 'predefined', values in ‘zInit' will be used to initialize 'z'. Default ’split'.

countChecksum

Character. An MD5 checksum for the 'counts' matrix. Default NULL.

zInit

Integer vector. Sets initial starting values of z. 'zInit' is only used when ‘zInitialize = ’predfined''. Default NULL.

logfile

Character. Messages will be redirected to a file named 'logfile'. If NULL, messages will be printed to stdout. Default NULL.

verbose

Logical. Whether to print log messages. Default TRUE.

Value

A SingleCellExperiment object. Function parameter settings are stored in the metadata "celda_parameters" slot. Columns celda_sample_label and celda_cell_cluster in colData contain sample labels and celda cell population clusters.

See Also

celda_G for feature clustering and celda_CG for simultaneous clustering of features and cells. celdaGridSearch can be used to run multiple values of K and multiple chains in parallel.

Examples

data(celdaCSim)
sce <- celda_C(celdaCSim$counts,
    K = celdaCSim$K,
    sampleLabel = celdaCSim$sampleLabel,
    nchains = 1)

campbio/celda documentation built on April 5, 2024, 11:47 a.m.