chao | R Documentation |
Package singleRcapture
utilizes various family type
functions that specify variable parts of population size estimation,
regression, diagnostics and other necessary information that depends
on the model. These functions are used as model
argument in
estimatePopsize
function.
chao(lambdaLink = "loghalf", ...)
Hurdleztgeom(
lambdaLink = c("log", "neglog"),
piLink = c("logit", "cloglog", "probit"),
...
)
Hurdleztnegbin(
nSim = 1000,
epsSim = 1e-08,
eimStep = 6,
lambdaLink = c("log", "neglog"),
alphaLink = c("log", "neglog"),
piLink = c("logit", "cloglog", "probit"),
...
)
Hurdleztpoisson(
lambdaLink = c("log", "neglog"),
piLink = c("logit", "cloglog", "probit"),
...
)
oiztgeom(
lambdaLink = c("log", "neglog"),
omegaLink = c("logit", "cloglog", "probit"),
...
)
oiztnegbin(
nSim = 1000,
epsSim = 1e-08,
eimStep = 6,
lambdaLink = c("log", "neglog"),
alphaLink = c("log", "neglog"),
omegaLink = c("logit", "cloglog", "probit"),
...
)
oiztpoisson(
lambdaLink = c("log", "neglog"),
omegaLink = c("logit", "cloglog", "probit"),
...
)
zelterman(lambdaLink = "loghalf", ...)
zotgeom(lambdaLink = c("log", "neglog"), ...)
zotnegbin(
nSim = 1000,
epsSim = 1e-08,
eimStep = 6,
lambdaLink = c("log", "neglog"),
alphaLink = c("log", "neglog"),
...
)
zotpoisson(lambdaLink = c("log", "neglog"), ...)
ztHurdlegeom(
lambdaLink = c("log", "neglog"),
piLink = c("logit", "cloglog", "probit"),
...
)
ztHurdlenegbin(
nSim = 1000,
epsSim = 1e-08,
eimStep = 6,
lambdaLink = c("log", "neglog"),
alphaLink = c("log", "neglog"),
piLink = c("logit", "cloglog", "probit"),
...
)
ztHurdlepoisson(
lambdaLink = c("log", "neglog"),
piLink = c("logit", "cloglog", "probit"),
...
)
ztgeom(lambdaLink = c("log", "neglog"), ...)
ztnegbin(
nSim = 1000,
epsSim = 1e-08,
eimStep = 6,
lambdaLink = c("log", "neglog"),
alphaLink = c("log", "neglog"),
...
)
ztoigeom(
lambdaLink = c("log", "neglog"),
omegaLink = c("logit", "cloglog", "probit"),
...
)
ztoinegbin(
nSim = 1000,
epsSim = 1e-08,
eimStep = 6,
lambdaLink = c("log", "neglog"),
alphaLink = c("log", "neglog"),
omegaLink = c("logit", "cloglog", "probit"),
...
)
ztoipoisson(
lambdaLink = c("log", "neglog"),
omegaLink = c("logit", "cloglog", "probit"),
...
)
ztpoisson(lambdaLink = c("log", "neglog"), ...)
lambdaLink |
a link for Poisson parameter, |
... |
Additional arguments, not used for now. |
piLink |
a link for probability parameter, |
nSim , epsSim |
if working weights cannot be computed analytically these arguments specify maximum number of simulations allowed and precision level for finding them numerically respectively. |
eimStep |
a non negative integer describing
how many values should be used at each step of approximation
of information matrixes when no analytic solution is available
(e.g. |
alphaLink |
a link for dispersion parameter, |
omegaLink |
a link for inflation parameter, |
Most of these functions are based on some "base" distribution with
support \mjseqn\mathbbN_0=\mathbbN\cup\lbrace 0\rbrace that describe
distribution of \mjseqnY before truncation. Currently they include:
\mjsdeqn\mathbbP(Y=y|\lambda,\alpha)=\left\lbrace
\beginarraycc
\frac\lambda^ye^-\lambday! & \textPoisson distribution
\frac\Gamma(y+\alpha^-1)\Gamma(\alpha^-1)y!
\left(\frac\alpha^-1\alpha^-1+\lambda\right)^\alpha^-1
\left(\frac\lambda\alpha^-1+\lambda\right)^y &
\textnegative binomial distribution
\frac\lambda^y(1+\lambda)^y+1 &
\textgeometric distribution
\endarray
\right.
where \mjseqn\lambda is the Poisson parameter and
\mjseqn\alpha is the dispersion parameter. Geometric distribution
is a special case of negative binomial distribution when
\mjseqn\alpha=1 it is included because negative binomial
distribution is quite troublesome numerical regression in fitting.
It is important to know that PMF of negative binomial distribution
approaches the PMF of Poisson distribution when
\mjseqn\alpha\rightarrow 0^+.
Note in literature on single source capture recapture models the dispersion parameter which introduces greater variability in negative binomial distribution compared to Poisson distribution is generally interpreted as explaining the unobserved heterogeneity i.e. presence of important unobserved independent variables. All these methods for estimating population size are tied to Poisson processes hence we use \mjseqn\lambda as parameter symbol instead of \mjseqn\mu to emphasize this connection. Also will not be hard to see that all estimators derived from modifying the "base" distribution are unbiased if assumptions made by respective models are not violated.
The zero-truncated models corresponding to "base" distributions are
characterized by relation:
\mjsdeqn\mathbbP(Y=y|Y>0)=\left\lbrace
\beginarraycc
\frac\mathbbP(Y=y)1-\mathbbP(Y=0) & \textwhen y\neq 0
0 & \textwhen y=0
\endarray\right.
which allows us to estimate parameter values using only observed part of
population. These models lead to the following estimates, respectively:
\mjsdeqn
\beginaligned
\hatN &= \sum_k=1^N_obs\frac11-\exp(-\lambda_k) &
\text For Poisson distribution
\hatN &= \sum_k=1^N_obs\frac11-(1+\alpha_k\lambda_k)^-\alpha_k^-1 &
\text For negative binomial distribution
\hatN &= \sum_k=1^N_obs\frac1+\lambda_k\lambda_k &
\text For geometric distribution
\endaligned
One common way in which assumptions of zero-truncated models are violated is
presence of one-inflation the presence of which is somewhat similar in
single source capture-recapture models to zero-inflation in usual count data
analysis. There are two ways in which one-inflation may be understood,
they relate to whether \mjseqn\mathbbP(Y=0) is
modified by inflation. The first approach is inflate
(\mjseqn\omega parameter) zero-truncated distribution as:
\mjsdeqn
\mathbbP_new(Y=y|Y>0) = \left\lbrace\beginarraycc
\omega + (1 - \omega)\mathbbP_old(Y=1|Y>0)& \textwhen: y = 1
(1 - \omega) \mathbbP_old(Y=y|Y>0) & \textwhen: y \neq 1
\endarray\right.
which corresponds to:
\mjsdeqn
\mathbbP_new(Y=y) = \left\lbrace\beginarraycc
\mathbbP_old(Y=0) & \textwhen: y = 0
\omega(1 - \mathbbP(Y=0)) + (1 - \omega)\mathbbP_old(Y=1) & \textwhen: y = 1
(1 - \omega) \mathbbP_old(Y=y) & \textwhen: y > 1
\endarray\right.
before zero-truncation. Models that utilize this
approach are commonly referred to as zero-truncated one-inflated models.
Another way of accommodating one-inflation in SSCR is by putting inflation
parameter on base distribution as:
\mjsdeqn
\mathbbP_new(Y=y) = \left\lbrace\beginarraycc
\omega + (1 - \omega)\mathbbP_old(Y=1)& \textwhen: y = 1
(1 - \omega) \mathbbP_old(Y=y) & \textwhen: y \neq 1
\endarray\right.
which then becomes:
\mjsdeqn
\mathbbP_new(Y=y|Y>0) = \left\lbrace\beginarraycc
\frac\omega1 - (1-\omega)\mathbbP_old(Y=0) + \frac(1 - \omega)1 - (1-\omega)\mathbbP_old(Y=0)\mathbbP_old(Y=1)& \textwhen: y = 1
\frac(1 - \omega)1 - (1-\omega)\mathbbP_old(Y=0)\mathbbP_old(Y=y) & \textwhen: y > 1
\endarray\right.
after truncation.
It was shown by Böhning in 2022 paper that these approaches are equivalent
in terms of maximizing likelihoods if we do not put formula on
\mjseqn\omega. They can however lead to different
population size estimates.
For zero-truncated one-inflated models the formula for population size estimate \mjseqn\hatN does not change since \mjseqn\mathbbP(y=0) remains the same but estimation of parameters changes all calculations.
For one-inflated zero-truncated models population size estimates are
expressed, respectively by:
\mjsdeqn
\beginaligned
\hatN &= \sum_k=1^N_obs\frac11-(1-\omega_k)\exp(-\lambda_k) &\text For base Poisson distribution
\hatN &= \sum_k=1^N_obs\frac11-(1-\omega_k)(1+\alpha_k\lambda_k)^-\alpha_k^-1 &\text For base negative binomial distribution
\hatN &= \sum_k=1^N_obs\frac1+\lambda_k\lambda_k + \omega_k &\text For base geometric distribution
\endaligned
Zero-one-truncated models ignore one counts instead of accommodating
one-inflation by utilizing the identity
\mjsdeqn
\ell_\textztoi=\boldsymbolf_1\ln\frac\boldsymbolf_1N_obs
+(N_obs-\boldsymbolf_1)\ln\left(1-\frac\boldsymbolf_1N_obs
\right) + \ell_\textzot
where \mjseqn\ell_\textzot is the log likelihood
of zero-one-truncated distribution characterized by probability mass function:
\mjsdeqn\mathbbP(Y=y|Y>1)=\left\lbrace
\beginarraycc
\frac\mathbbP(Y=y)1-\mathbbP(Y=0)-\mathbbP(Y=1) & \textwhen y > 1
0 & \textwhen y\in\lbrace 0, 1\rbrace
\endarray\right.
where \mjseqn\mathbbP(Y) is the probability mass function of
the "base" distribution. The identity above justifies use of zero-one-truncated,
unfortunately it was only proven for intercept only models, however
numerical simulations seem to indicate that even if the theorem cannot be
extended for (non trivial) regression population size estimation is still
possible.
For zero-one-truncated models population size estimates are expressed by:
\mjsdeqn
\beginaligned
\hatN &= \boldsymbolf_1 + \sum_k=1^N_obs
\frac1-\lambda_k\exp(-\lambda_k)1-\exp(-\lambda_k)-\lambda_k\exp(-\lambda_k)
&\text For base Poisson distribution
\hatN &= \boldsymbolf_1 + \sum_k=1^N_obs
\frac1-\lambda_k(1+\alpha_k\lambda_k)^-1-\alpha_k^-1
1-(1+\alpha_k\lambda_k)^-\alpha_k^-1-\lambda_k(1+\alpha_k\lambda_k)^-1-\alpha_k^-1
&\text For base negative binomial distribution
\hatN &= \boldsymbolf_1 + \sum_k=1^N_obs
\frac\lambda_k^2+\lambda_k+1\lambda_k^2
&\text For base geometric distribution
\endaligned
Pseudo hurdle models are experimental and not yet described in literature.
Lastly there are chao and zelterman models which are based on
logistic regression on the dummy variable
\mjsdeqn
Z = \left\lbrace\beginarraycc
0 & \textif Y = 1
1 & \textif Y = 2
\endarray\right.
based on the equation:
\mjsdeqn
\textlogit(p_k)=
\ln\left(\frac\lambda_k2\right)=
\boldsymbol\beta\mathbfx_k=\eta_k
where \mjseqn\lambda_k is the Poisson parameter.
The zelterman estimator of population size is expressed as: \mjsdeqn\hatN=\sum_k=1^N_obs1-\exp\left(-\lambda_k\right) and chao estimator has the form: \mjsdeqn \hatN=N_obs+\sum_k=1^\boldsymbolf_1+\boldsymbolf_2 \frac1\lambda_k+ \frac\lambda_k^22
A object of class family
containing objects:
makeMinusLogLike
– A factory function for creating the following functions:
\mjseqn\ell(\boldsymbol\beta), \frac\partial\ell\partial\boldsymbol\beta,
\frac\partial^2\ell\partial\boldsymbol\beta^T\partial\boldsymbol\beta
functions from the
\mjseqn\boldsymboly vector and the
\mjseqn\boldsymbolX_vlm matrix
(or just \mjseqn\boldsymbolX if applied to model
with single linear predictor)which has the deriv
argument with possible
values in c(0, 1, 2)
that determine which derivative to return;
the default value is 0
, which represents the minus log-likelihood.
links
– A list with link functions.
mu.eta, variance
– Functions of linear predictors that
return expected value and variance. The type
argument with 2 possible
values ("trunc"
and "nontrunc"
) that specifies whether
to return \mjseqn\mathbbE(Y|Y>0), \textvar(Y|Y>0) or
\mjseqn\mathbbE(Y), \textvar(Y) respectively; the deriv
argument
with values in c(0, 1, 2)
is used for indicating the derivative with
respect to the linear predictors, which is used for providing
standard errors in the predict
method.
family
– A string that specifies name of the model.
valideta, validmu
– For now it only returns TRUE
.
In the near future, it will be used to check whether applied linear
predictors are valid (i.e. are transformed into some elements of the
parameter space subjected to the inverse link function).
funcZ, Wfun
– Functions that create pseudo residuals and
working weights used in IRLS algorithm.
devResids
– A function that given the linear predictors
prior weights vector and response vector returns deviance residuals.
Not all family functions have these functions implemented yet.
pointEst, popVar
– Functions that given prior weights
linear predictors and in the latter case also estimate of
\mjseqn\textcov(\hat\boldsymbol\beta) and \mjseqn\boldsymbolX_vlm
matrix return point estimate for population size and analytic estimation
of its variance.There is a additional boolean parameter contr
in the
former function that if set to true returns contribution of each unit.
etaNames
– Names of linear predictors.
densityFunction
– A function that given linear predictors
returns value of PMF at values x
. Additional argument type
specifies whether to return \mjseqn\mathbbP(Y|Y>0) or
\mjseqn\mathbbP(Y).
simulate
– A function that generates values of a dependent
vector given linear predictors.
getStart
– An expression for generating starting points.
Piotr Chlebicki, Maciej Beręsewicz
estimatePopsize()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.