scde.error.models | R Documentation |
Fit error models given a set of single-cell data (counts) and an optional grouping factor (groups). The cells (within each group) are first cross-compared to determine a subset of genes showing consistent expression. The set of genes is then used to fit a mixture model (Poisson-NB mixture, with expression-dependent concomitant).
scde.error.models(counts, groups = NULL, min.nonfailed = 3,
threshold.segmentation = TRUE, min.count.threshold = 4,
zero.count.threshold = min.count.threshold, zero.lambda = 0.1,
save.crossfit.plots = FALSE, save.model.plots = TRUE, n.cores = 12,
min.size.entries = 2000, max.pairs = 5000, min.pairs.per.cell = 10,
verbose = 0, linear.fit = TRUE, local.theta.fit = linear.fit,
theta.fit.range = c(0.01, 100))
counts |
read count matrix. The rows correspond to genes (should be named), columns correspond to individual cells. The matrix should contain integer counts |
groups |
an optional factor describing grouping of different cells. If provided, the cross-fits and the expected expression magnitudes will be determined separately within each group. The factor should have the same length as ncol(counts). |
min.nonfailed |
minimal number of non-failed observations required for a gene to be used in the final model fitting |
threshold.segmentation |
use a fast threshold-based segmentation during cross-fit (default: TRUE) |
min.count.threshold |
the number of reads to use to guess which genes may have "failed" to be detected in a given measurement during cross-cell comparison (default: 4) |
zero.count.threshold |
threshold to guess the initial value (failed/non-failed) during error model fitting procedure (defaults to the min.count.threshold value) |
zero.lambda |
the rate of the Poisson (failure) component (default: 0.1) |
save.crossfit.plots |
whether png files showing cross-fit segmentations should be written out (default: FALSE) |
save.model.plots |
whether pdf files showing model fits should be written out (default = TRUE) |
n.cores |
number of cores to use |
min.size.entries |
minimum number of genes to use when determining expected expression magnitude during model fitting |
max.pairs |
maximum number of cross-fit comparisons that should be performed per group (default: 5000) |
min.pairs.per.cell |
minimum number of pairs that each cell should be cross-compared with |
verbose |
1 for increased output |
linear.fit |
Boolean of whether to use a linear fit in the regression (default: TRUE). |
local.theta.fit |
Boolean of whether to fit the overdispersion parameter theta, ie. the negative binomial size parameter, based on local regression (default: set to be equal to the linear.fit parameter) |
theta.fit.range |
Range of valid values for the overdispersion parameter theta, ie. the negative binomial size parameter (default: c(1e-2, 1e2)) |
Note: the default implementation has been changed to use linear-scale fit with expression-dependent NB size (overdispersion) fit. This represents an interative improvement on the originally published model. Use linear.fit=F to revert back to the original fitting procedure.
a model matrix, with rows corresponding to different cells, and columns representing different parameters of the determined models
data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)
o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.