knn.error.models: Build error models for heterogeneous cell populations, based...
In hms-dbmi/scde: Single Cell Differential Expression

knn.error.models

R Documentation

Build error models for heterogeneous cell populations, based on K-nearest neighbor cells.

Description

Builds cell-specific error models assuming that there are multiple subpopulations present among the measured cells. The models for each cell are based on average expression estimates obtained from K closest cells within a given group (if groups = NULL, then within the entire set of measured cells). The method implements fitting of both the original log-fit models (when linear.fit = FALSE), or newer linear-fit models (linear.fit = TRUE, default) with locally fit overdispersion coefficient (local.theta.fit = TRUE, default).

Usage

knn.error.models(counts, groups = NULL, k = round(ncol(counts)/2),
  min.nonfailed = 5, min.count.threshold = 1, save.model.plots = TRUE,
  max.model.plots = 50, n.cores = parallel::detectCores(),
  min.size.entries = 2000, min.fpm = 0, cor.method = "pearson",
  verbose = 0, fpm.estimate.trim = 0.25, linear.fit = TRUE,
  local.theta.fit = linear.fit, theta.fit.range = c(0.01, 100),
  alpha.weight.power = 1/2)

Arguments

`counts`	count matrix (integer matrix, rows- genes, columns- cells)
`groups`	optional groups partitioning known subpopulations
`k`	number of nearest neighbor cells to use during fitting. If k is set sufficiently high, all of the cells within a given group will be used.
`min.nonfailed`	minimum number of non-failed measurements (within the k nearest neighbor cells) required for a gene to be taken into account during error fitting procedure
`min.count.threshold`	minimum number of reads required for a measurement to be considered non-failed
`save.model.plots`	whether model plots should be saved (file names are (group).models.pdf, or cell.models.pdf if no group was supplied)
`max.model.plots`	maximum number of models to save plots for (saves time when there are too many cells)
`n.cores`	number of cores to use through the calculations
`min.size.entries`	minimum number of genes to use for model fitting
`min.fpm`	optional parameter to restrict model fitting to genes with group-average expression magnitude above a given value
`cor.method`	correlation measure to be used in determining k nearest cells
`verbose`	level of verbosity
`fpm.estimate.trim`	trim fraction to be used in estimating group-average gene expression magnitude for model fitting (0.5 would be median, 0 would turn off trimming)
`linear.fit`	whether newer linear model fit with zero intercept should be used (T), or the log-fit model published originally (F)
`local.theta.fit`	whether local theta fitting should be used (only available for the linear fit models)
`theta.fit.range`	allowed range of the theta values
`alpha.weight.power`	1/theta weight power used in fitting theta dependency on the expression magnitude

Value

a data frame with parameters of the fit error models (rows- cells, columns- fitted parameters)

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)

hms-dbmi/scde documentation built on April 19, 2023, 10:21 p.m.