bayNorm: A wrapper function of prior estimation and bayNorm function

Description Usage Arguments Details Value References Examples

View source: R/bayNorm.r

Description

This is the main wrapper function for bayNorm. The input is a matrix of raw scRNA-seq data and a vector of capture efficiencies of cells. You can also specify the condition of cells for normalizing multiple groups of cells separately.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
bayNorm(
  Data,
  BETA_vec = NULL,
  Conditions = NULL,
  UMI_sffl = NULL,
  Prior_type = NULL,
  mode_version = FALSE,
  mean_version = FALSE,
  S = 20,
  parallel = TRUE,
  NCores = 5,
  FIX_MU = TRUE,
  GR = FALSE,
  BB_SIZE = TRUE,
  verbose = TRUE,
  out.sparse = FALSE
)

Arguments

Data

A matrix of single-cell expression where rows are genes and columns are samples (cells). Data can be of class SummarizedExperiment (the assays slot contains the expression matrix, is named "Counts"), just matrix or sparse matrix.

BETA_vec

A vector of capture efficiencies (probabilities) of cells. If it is null, library size (total count) normalized to 0.06 will be used as the input BETA_vec. BETA_vec less and equal to 0 or greater and equal to 1 will be replaced by the minimum and maximum of the BETA_vec which range between (0,1) respectively.

Conditions

vector of condition labels, this should correspond to the columns of the Data. D efault is NULL, which assumes that all cells belong to the same group.

UMI_sffl

Scaling factors are required only for non-UMI based data for which Data is devided by UMI_sffl. If non-null and Conditions is non-null, then UMI_sffl should be a vector of length equal to the number of groups. Default is NULL.

Prior_type

Determines what groups of cells is used in estimating prior using Conditions. Default is NULL. If Conditions is NULL, priors are estimated based on all cells. If Conditions is not NULL and if Prior_type is LL, priors are estimated within each group respectively. If Prior_type is GG, priors are estimated based on cells from all groups. LL is suitable for DE detection. GG is preferred if reduction of batch effect between samples are desired for example for technical replicates (see bayNorm paper).

mode_version

If TRUE, bayNorm return modes of posterior estimates as normalized data which is a 2D matrix rather than samples from posterior which is a 3D array. Default is FALSE.

mean_version

If TRUE, bayNorm return means of posterior estimates as normalized data, which is a 2D matrix rather than samples from posterior which is a 3D array. Default is FALSE.

S

The number of samples you would like to generate from estimated posterior distribution (The third dimension of 3D array). Default is 20. S needs to be specified if mode_version=FALSE.

parallel

If TRUE, NCores cores will be used for parallelization. Default is TRUE.

NCores

number of cores to use, default is 5. This will be used to set up a parallel environment using either MulticoreParam (Linux, Mac) or SnowParam (Windows) with NCores using the package BiocParallel.

FIX_MU

Whether fix mu (the mean parameter of prior distribution) to its MME estimate, when estimating prior parameters by maximizing marginal distribution. If TRUE, then 1D optimization is used, otherwise 2D optimization for both mu and size is used (slow). Default is TRUE.

GR

If TRUE, the gradient function will be used in optimization. However since the gradient function itself is very complicated, it does not help too much in speeding up. Default is FALSE.

BB_SIZE

If TRUE, estimate size parameter of prior using maximization of marginal likelihood, and then use it for adjusting MME estimate of SIZE Default is TRUE.

verbose

print out status messages. Default is TRUE.

out.sparse

Only valid for mean version: Whether the output is of type dgCMatrix or not. Default is FALSE.

Details

A wrapper function of prior estimation and bayNorm function.

Value

List containing 3D arrays of normalized expression (if mode_version=FALSE) or 2D matrix of normalized expression (if mode_version=TRUE or mean_version=TRUE), a list contains estimated priors and a list contains input parameters used: BETA_vec, Conditions (if specified), UMI_sffl (if specified), Prior_type, FIX_MU, BB_SIZE and GR.

References

Wenhao Tang, Francois Bertaux, Philipp Thomas, Claire Stefanelli, Malika Saint, Samuel Blaise Marguerat, Vahid Shahrezaei bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data Bioinformatics, btz726; doi: 10.1093/bioinformatics/btz726

Examples

1
2
3
4
5
6
data('EXAMPLE_DATA_list')
#Return 3D array normalzied data:
bayNorm_3D<-bayNorm(
Data=EXAMPLE_DATA_list$inputdata[,seq(1,30)],
BETA_vec = EXAMPLE_DATA_list$inputbeta[seq(1,30)],
mode_version=FALSE,parallel =FALSE)

bayNorm documentation built on Nov. 8, 2020, 8:25 p.m.