regDiagSingleR: Regression diagnostics in singleRcapture
In singleRcapture: Single-Source Capture-Recapture Models

regDiagSingleR

R Documentation

Regression diagnostics in singleRcapture

Description

List of some regression diagnostics implemented for singleRStaticCountData class. Functions that either require no changes from glm class or are not relevant to context of singleRcapture are omitted.

Usage

dfpopsize(model, ...)

## S3 method for class 'singleRStaticCountData'
dfpopsize(model, dfbeta = NULL, ...)

## S3 method for class 'singleRStaticCountData'
dfbeta(model, maxitNew = 1, trace = FALSE, cores = 1, ...)

## S3 method for class 'singleRStaticCountData'
hatvalues(model, ...)

## S3 method for class 'singleRStaticCountData'
residuals(
  object,
  type = c("pearson", "pearsonSTD", "response", "working", "deviance", "all"),
  ...
)

## S3 method for class 'singleRStaticCountData'
cooks.distance(model, ...)

## S3 method for class 'singleRStaticCountData'
sigma(object, ...)

## S3 method for class 'singleRStaticCountData'
influence(model, do.coef = FALSE, ...)

## S3 method for class 'singleRStaticCountData'
rstudent(model, ...)

## S3 method for class 'singleRStaticCountData'
rstandard(model, type = c("deviance", "pearson"), ...)

Arguments

`model`, `object`	an object of `singleRStaticCountData` class.
`...`	arguments passed to other methods. Notably `dfpopsize.singleRStaticCountData` calls `dfbeta.singleRStaticCountData` if no `dfbeta` argument was provided and `controlMethod` is called in `dfbeta` method.
`dfbeta`	if `dfbeta` was already obtained it is possible to pass them into function so that they need not be computed for the second time.
`maxitNew`	the maximal number of iterations for regressions with starting points \mjseqn\hat\boldsymbol\beta on data specified at call for `model` after the removal of k'th row. By default 1.
`trace`	a logical value specifying whether to tracking results when `cores > 1` it will result in a progress bar being created.
`cores`	a number of processor cores to be used, any number greater than 1 activates code designed with `doParallel`, `foreach` and `parallel` packages. Note that for now using parallel computing makes tracing impossible so `trace` parameter is ignored in this case.
`type`	a type of residual to return.
`do.coef`	logical indicating if `dfbeta` computation for influence should be done. `FALSE` by default.

Details

\loadmathjax

dfpopsize and dfbeta are closely related. dfbeta fits a regression after removing a specific row from the data and returns the difference between regression coefficients estimated on full data set and data set obtained after deletion of that row, and repeats procedure once for every unit present in the data.dfpopsize does the same for population size estimation utilizing coefficients computed by dfbeta.

cooks.distance is implemented (for now) only for models with a single linear predictor and works exactly like the method for glm class.

sigma computes the standard errors of predicted means. Returns a matrix with two columns first for truncated mean and the other for the non-truncated mean.

residuals.singleRStaticCountData (can be abbreviated to resid) works like residuals.glm with the exception that:

"pearson" – returns non standardized residuals.
"pearsonSTD" – is currently defined only for single predictors models but will be extended to all models in a near future, but for families with more than one distribution parameter it will be a multivariate residual.
"response" – returns both residuals computed with truncated and non truncated fitted value.
"working" – is possibly multivariate if more than one linear predictor is present.
"deviance" – is not yet defined for all families in singleRmodels() e.g. negative binomial based methods.
"all" – returns all available residual types.

hatvalues.singleRStaticCountData is method for singleRStaticCountData class for extracting diagonal elements of projection matrix.

Since singleRcapture supports not only regular glm's but also vglm's the hatvalues returns a matrix with number of columns corresponding to number of linear predictors in a model, where kth column corresponds to elements of the diagonal of projection matrix associated with kth linear predictor. For glm's \mjsdeqn\boldsymbolW^\frac12\boldsymbolX \left(\boldsymbolX^T\boldsymbolW\boldsymbolX\right)^-1 \boldsymbolX^T\boldsymbolW^\frac12 where: \mjseqn\boldsymbolW=\mathbbE\left(\textDiag \left(\frac\partial^2\ell\partial\boldsymbol\eta^T \partial\boldsymbol\eta\right)\right) and \mjseqn\boldsymbolX is a model (lm) matrix. For vglm's present in the package it is instead : \mjsdeqn\boldsymbolX_vlm \left(\boldsymbolX_vlm^T\boldsymbolW\boldsymbolX_vlm\right)^-1 \boldsymbolX_vlm^T\boldsymbolW where: \mjsdeqn \boldsymbolW = \mathbbE\left(\beginbmatrix \textDiag\left(\frac\partial^2\ell\partial\eta_1^T\partial\eta_1\right) & \textDiag\left(\frac\partial^2\ell\partial\eta_1^T\partial\eta_2\right) & \dotso & \textDiag\left(\frac\partial^2\ell\partial\eta_1^T\partial\eta_p\right)
\textDiag\left(\frac\partial^2\ell\partial\eta_2^T\partial\eta_1\right) & \textDiag\left(\frac\partial^2\ell\partial\eta_2^T\partial\eta_2\right) & \dotso & \textDiag\left(\frac\partial^2\ell\partial\eta_2^T\partial\eta_p\right)
\vdots & \vdots & \ddots & \vdots
\textDiag\left(\frac\partial^2\ell\partial\eta_p^T\partial\eta_1\right) & \textDiag\left(\frac\partial^2\ell\partial\eta_p^T\partial\eta_2\right) & \dotso & \textDiag\left(\frac\partial^2\ell\partial\eta_p^T\partial\eta_p\right) \endbmatrix\right) is a block matrix constructed by taking the expected value from diagonal matrixes corresponding to second derivatives with respect to each linear predictor (and mixed derivatives) and \mjseqn\boldsymbolX_vlm is a model (vlm) matrix constructed using specifications in controlModel and call to estimatePopsize.

influence works like glm counterpart computing the most important influence measures.

Value

For hatvalues – A matrix with n rows and p columns where n is a number of observations in the data and p is number of regression parameters.
For dfpopsize – A vector for which k'th element corresponds to the difference between point estimate of population size estimation on full data set and point estimate of population size estimation after the removal of k'th unit from the data set.
For dfbeta – A matrix with n rows and p observations where p is a number of units in data and p is the number of regression parameters. K'th row of this matrix corresponds to \mjseqn\hat\boldsymbol\beta-\hat\boldsymbol\beta_-k where \mjseqn\hat\boldsymbol\beta_-k is a vector of estimates for regression parameters after the removal of k'th row from the data.
cooks.distance – A matrix with a single columns with values of cooks distance for every unit in model.matrix
residuals.singleRStaticCountData – A data.frame with chosen residuals.

Author(s)

Piotr Chlebicki, Maciej Beręsewicz

Examples


# For singleRStaticCountData class
# Get simple model
Model <- estimatePopsize(
  formula = capture ~ nation + age + gender, 
  data = netherlandsimmigrant, 
  model = ztpoisson, 
  method = "IRLS"
)
# Get dfbeta
dfb <- dfbeta(Model)
# The dfpopsize results are obtained via (It is also possible to not provide 
# dfbeta then they will be computed manually):
res <- dfpopsize(Model, dfbeta = dfb)
summary(res)
plot(res)
# see vaious types of residuals:
head(resid(Model, "all"))

singleRcapture documentation built on April 4, 2025, 1:43 a.m.