View source: R/overdispersion.R
overdispersion_mle | R Documentation |
Estimate the Overdispersion for a Vector of Counts
overdispersion_mle(
y,
mean,
model_matrix = NULL,
do_cox_reid_adjustment = !is.null(model_matrix),
global_estimate = FALSE,
subsample = FALSE,
max_iter = 200,
verbose = FALSE
)
y |
a numeric or integer vector or matrix with the counts for which the overdispersion is estimated |
mean |
a numeric vector of either length 1 or |
model_matrix |
a numeric matrix that specifies the experimental
design. It can be produced using |
do_cox_reid_adjustment |
the classical maximum likelihood estimator of the |
global_estimate |
flag to decide if a single overdispersion for a whole matrix is calculated
instead of one estimate per row. This parameter has no affect if |
subsample |
the estimation of the overdispersion is the slowest step when fitting
a Gamma-Poisson GLM. For datasets with many samples, the estimation can be considerably sped up
without loosing much precision by fitting the overdispersion only on a random subset of the samples.
Default: |
max_iter |
the maximum number of iterations for each gene |
verbose |
a boolean that indicates if information about the individual steps are printed
while fitting the GLM. Default: |
The function is optimized to be fast on many small counts. To
achieve this, the frequency table of the counts is calculated and
used to avoid repetitive calculations. If
there are probably many unique counts the optimization is skipped.
Currently the heuristic is to skip if more than half of the counts
are expected to be unique. The estimation is based on the largest observed
count in y
.
An earlier version of this package (< 1.1.1) used a separate set of functions for the case of many small counts based on a paper by Bandara et al. (2019). However, this didn't bring a sufficient performance increase and meant an additional maintenance burden.
The function returns a list with the following elements:
estimate
the numerical estimate of the overdispersion.
iterations
the number of iterations it took to calculate the result.
message
additional information about the fitting process.
glm_gp()
set.seed(1)
# true overdispersion = 2.4
y <- rnbinom(n = 10, mu = 3, size = 1/2.4)
# estimate = 1.7
overdispersion_mle(y)
# true overdispersion = 0
y <- rpois(n = 10, lambda = 3)
# estimate = 0
overdispersion_mle(y)
# with different mu, overdispersion estimate changes
overdispersion_mle(y, mean = 15)
# Cox-Reid adjustment changes the result
overdispersion_mle(y, mean = 15, do_cox_reid_adjustment = FALSE)
# Many very small counts, true overdispersion = 50
y <- rnbinom(n = 1000, mu = 0.01, size = 1/50)
summary(y)
# estimate = 31
overdispersion_mle(y, do_cox_reid_adjustment = TRUE)
# Function can also handle matrix input
Y <- matrix(rnbinom(n = 10 * 3, mu = 4, size = 1/2.2), nrow = 10, ncol = 3)
Y
as.data.frame(overdispersion_mle(Y))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.