knn.error.models | R Documentation |
Builds cell-specific error models assuming that there are multiple subpopulations present among the measured cells. The models for each cell are based on average expression estimates obtained from K closest cells within a given group (if groups = NULL, then within the entire set of measured cells). The method implements fitting of both the original log-fit models (when linear.fit = FALSE), or newer linear-fit models (linear.fit = TRUE, default) with locally fit overdispersion coefficient (local.theta.fit = TRUE, default).
knn.error.models(counts, groups = NULL, k = round(ncol(counts)/2),
min.nonfailed = 5, min.count.threshold = 1, save.model.plots = TRUE,
max.model.plots = 50, n.cores = parallel::detectCores(),
min.size.entries = 2000, min.fpm = 0, cor.method = "pearson",
verbose = 0, fpm.estimate.trim = 0.25, linear.fit = TRUE,
local.theta.fit = linear.fit, theta.fit.range = c(0.01, 100),
alpha.weight.power = 1/2)
counts |
count matrix (integer matrix, rows- genes, columns- cells) |
groups |
optional groups partitioning known subpopulations |
k |
number of nearest neighbor cells to use during fitting. If k is set sufficiently high, all of the cells within a given group will be used. |
min.nonfailed |
minimum number of non-failed measurements (within the k nearest neighbor cells) required for a gene to be taken into account during error fitting procedure |
min.count.threshold |
minimum number of reads required for a measurement to be considered non-failed |
save.model.plots |
whether model plots should be saved (file names are (group).models.pdf, or cell.models.pdf if no group was supplied) |
max.model.plots |
maximum number of models to save plots for (saves time when there are too many cells) |
n.cores |
number of cores to use through the calculations |
min.size.entries |
minimum number of genes to use for model fitting |
min.fpm |
optional parameter to restrict model fitting to genes with group-average expression magnitude above a given value |
cor.method |
correlation measure to be used in determining k nearest cells |
verbose |
level of verbosity |
fpm.estimate.trim |
trim fraction to be used in estimating group-average gene expression magnitude for model fitting (0.5 would be median, 0 would turn off trimming) |
linear.fit |
whether newer linear model fit with zero intercept should be used (T), or the log-fit model published originally (F) |
local.theta.fit |
whether local theta fitting should be used (only available for the linear fit models) |
theta.fit.range |
allowed range of the theta values |
alpha.weight.power |
1/theta weight power used in fitting theta dependency on the expression magnitude |
a data frame with parameters of the fit error models (rows- cells, columns- fitted parameters)
data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.