Dino | R Documentation |
Dino
removes cell-to-cell variation in observed
counts due to the effects of sequencing depth from single-cell mRNA
sequencing experiments. Dino
was particularly designed with UMI
based protocols in mind, but is applicable to non-UMI based chemistries
in the library preparation stage of sequencing.
Dino(counts, nCores = 2, prec = 3, minNZ = 10, nSubGene = 1e4, nSubCell = 1e4, depth = NULL, slope = NULL, minSlope = 1/2, maxSlope = 2, clusterSlope = TRUE, returnMeta = FALSE, doRQS = FALSE, emPar = list(maxIter = 100, tol = 0.1, conPar = 15, maxK = 100), ...)
counts |
A numeric matrix object of expression counts - usually in dgCMatrix format for memory efficiency. Column names denote cells (samples or droplets) and row names denote genes. |
nCores |
A non-negative integer scalar denoting the number of cores
which should be used. Setting nCores to 0 uses all cores as determined by
running |
prec |
A positive integer denoting the number of decimals to which to
round depth (if estimated internally via |
minNZ |
A positive integer denoting the minimum number of non-zero counts for a gene to be normalized by the Dino algorithm. It is recommended to pre-filter the counts matrix such that all genes meet this threshold. Otherwise, genes with fewer than minNZ non-zeros will be scaled by depth for normalization. |
nSubGene |
A positive integer denoting the number of genes to subset for calculation of slope. |
nSubCell |
A positive integer denoting the number of samples to subset for calculation of slope and the EM algorithm. |
depth |
A numeric vector of length equal to the columns of counts.
depth denotes a median-centered, log-scale measure of cell-wise
sequencing depth. |
slope |
A numeric scalar denoting the count-depth relationship on
the log-log scale. Typical values are close to 1 (implying a unit
increase in depth corresponds to a unit increase in expected counts on
the log-log scale), but may be higher, particularly in the case of
non-UMI protocols. |
minSlope |
A numeric scalar denoting the minimum slope. Fitted slopes below this value will return a warning and be set to 1 |
maxSlope |
A numeric scalar denoting the maximum slope. Fitted slopes above this value will return a warning and be set to 1 |
clusterSlope |
A logical indicating whether cells should be pre-clustered prior to calculation of slope. Under the default where cells are pre-clustered, cluster is used as a factor in the regression. |
returnMeta |
A logical indicating whether metadata (sequencing depth and slope) should be returned. |
doRQS |
A logical indicating how normalization resampling is to be done. By default (F), normalization is done by resampling from the full posterior distribution. Alternately, restricted quantile sampling (RQS) can be performed to enforce stronger preservation of expression ranks in normalized data. Currently RQS is considered experimental. |
emPar |
A list of parameters to send to the EM algorithm. maxIter denotes the maximum number of model updates. tol denotes the cutoff threshold for reductions in the log likelihood function. conPar denotes the concentration parameter for the resampling. conPar = 1 implies full resampling from the fitted distribution. As conPar increases, the normalized expression converges to the scale-factor normalized values. maxK denotes the maximum number of mixture components in the mixture model. |
... |
Additional parameters to pass to |
Dino
by default returns a matrix of normalized expression
with identical dimensions as counts. If returnMeta = TRUE,
then Dino
returns a list of normalized expression, sequencing
depth, and slope.
Jared Brown
Brown, J., Ni, Z., Mohanty, C., Bacher, R. and Kendziorski, C. (2020) "Normalization by distributional resampling of high throughput single-cell RNA-sequencing data." bioRxiv. https://doi.org/10.1101/2020.10.28.359901
# raw data data("pbmcSmall") str(pbmcSmall) # run Dino on raw expression matrix pbmcSmall_Norm <- Dino(pbmcSmall) str(pbmcSmall_Norm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.