newWave: Perform dimensionality reduction using a nb regression model...
In NewWave: Negative binomial model for scRNA-seq

Description Usage Arguments Details Value Methods (by class) Examples

Given an object with the data, it performs dimensionality reduction using a nb regression model with gene and cell-level covariates.

newWave(Y, ...)

## S4 method for signature 'SummarizedExperiment'
newWave(
  Y,
  X,
  V,
  K = 2,
  which_assay,
  commondispersion = TRUE,
  verbose = FALSE,
  maxiter_optimize = 100,
  stop_epsilon = 1e-04,
  children = 1,
  random_init = FALSE,
  random_start = FALSE,
  n_gene_disp = NULL,
  n_cell_par = NULL,
  n_gene_par = NULL,
  ...
)

`Y`	The SummarizedExperiment with the data
`...`	Additional parameters to describe the model, see `newmodel`.
`X`	The design matrix containing sample-level covariates, one sample per row. If missing, X will contain only an intercept. If Y is a SummarizedExperiment object, X can be a formula using the variables in the colData slot of Y.
`V`	The design matrix containing gene-level covariates, one gene per row. If missing, V will contain only an intercept. If Y is a SummarizedExperiment object, V can be a formula using the variables in the rowData slot of Y.
`K`	integer. Number of latent factors(default 2).
`which_assay`	numeric or character. Which assay of Y to use. If missing, if 'assayNames(Y)' contains "counts" then that is used. Otherwise, the first assay is used.
`commondispersion`	Whether or not a single dispersion for all features is estimated (default TRUE).
`verbose`	Print helpful messages(default FALSE).
`maxiter_optimize`	maximum number of iterations for the optimization step (default 100).
`stop_epsilon`	stopping criterion in the optimization step, when the relative gain in likelihood is below epsilon (default 0.0001).
`children`	number of cores of the used cluster(default 1)
`random_init`	if TRUE no initializations is done(default FALSE)
`random_start`	if TRUE the setup of parameters is a random samplig (default FALSE)
`n_gene_disp`	number of genes used in mini-batch dispersion estimation approach(default NULL > all genes are used)
`n_cell_par`	number of cells used in mini-batch cells related parameters estimation approach(default NULL > all cells are used)
`n_gene_par`	number of genes used in mini-batch genes related parameters estimation approach(default NULL > all genes are used)

For visualization (heatmaps, ...), please use the normalized values. It corresponds to the deviance residuals when the W is not included in the model but the gene and cell-level covariates are. As a results, when W is not included in the model, the deviance residuals should capture the biology. Note that we do not recommend to use the normalized values for any downstream analysis (such as clustering, or differential expression), but only for visualization.

If one has already fitted a model using newmodel, the object containing such model can be used as input of newWave to save the resulting W into a SummarizedExperiment and optionally compute residuals and normalized values, without the need for re-fitting the model.

By default newWave uses all genes to estimate W. However, we recommend to use the top 1,000 most variable genes for this step. In general, a user can specify any custom set of genes to be used to estimate W, by specifying either a vector of gene names, or a single character string corresponding to a column of the rowData.

Note that if both which_genes is specified and at least one among observationalWeights, imputedValues, residuals, and normalizedValues is TRUE, the model needs to be fit twice.

An object of class SingleCellExperiment; the dimensionality reduced matrix is stored in the reducedDims slot and optionally normalized values and residuals are added in the list of assays.

SummarizedExperiment: Y is a SummarizedExperiment.

se <- SummarizedExperiment(matrix(rpois(60, lambda=5), nrow=10, ncol=6),
                           colData = data.frame(bio = gl(2, 3)))

m <- newWave(se, X="~bio")