knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dev = "png", fig.width = 7, fig.height = 4, message = FALSE, warning = FALSE ) options(width = 800, ggeffects_warning_bias_correction = FALSE) pkgs <- c( "ggplot2", "lme4", "glmmTMB", "patchwork", "sjlabelled", "htmltools" ) if (!all(vapply(pkgs, requireNamespace, quietly = TRUE, FUN.VALUE = logical(1L)))) { knitr::opts_chunk$set(eval = FALSE) }
library(htmltools) callout_tip <- function(header = NULL, ...) { div( class = "callout-tip", tags$h1( tags$img(src = "../man/figures/summary.png", width = "20", height = "17", style = "vertical-align:middle"), # nolint header ), ... ) } includeCSS("../man/figures/callout.css")
This vignette shows how to calculate adjusted predictions for mixed models. However, for mixed models, since random effects are involved, we can calculate conditional predictions and marginal predictions. We also have to distinguish between population-level and unit-level predictions.
But one thing at a time...
callout_tip( "Summary of most important points:", tags$ul( tags$li("Predictions can be made on the population-level or for each level of the grouping variable (unit-level). If unit-level predictions are requested, you need to set ", tags$code("type=\"random\""), " and specify the grouping variable(s) in the ", tags$code("terms"), " argument."), # nolint tags$li("Population-level predictions can be either conditional (predictions for a \"typical\" group) or marginal (average predictions across all groups). Set ", tags$code("margin=\"empirical\""), " for marginal predictions. You'll notice differences in predictions especially for unequal group sizes at the random effects level."), # nolint tags$li("Prediction intervals, i.e. when ", tags$code("interval=\"prediction\""), " also account for the uncertainty in the random effects.") # nolint ) )
Mixed models are used to account for the dependency of observations within groups, e.g. repeated measurements within subjects, or students within schools. The dependency is modeled by random effects, i.e. mixed model at least have one grouping variable (or factor) as higher level unit.
At the lowest level, you have your fixed effects, i.e. your "independent variables" or "predictors".
Adjusted predictions can now be calculated for specified values or levels of the focal terms, however, either for the full sample (population-level) or for each level of the grouping variable (unit-level). The latter is particularly useful when the grouping variable is of interest, e.g. when you want to compare the effect of a predictor between different groups.
To get population-level predictions, we set type = "fixed"
or type = "zero_inflation"
(for models with zero-inflation component).
We start with the population-level predictions. Here you can either calculate the conditional or the marginal effect (see in detail also Heiss 2022). The conditional effect is the effect of a predictor in an average or typical group, while the marginal effect is the average effect of a predictor across all groups. E.g. let's say we have countries
as grouping variable and gdp
(gross domestic product per capita) as predictor, then the conditional and marginal effect would be:
conditional effect: effect of gdp
in an average or typical country. To get conditional predictions, we use predict_response()
or predict_response(margin = "mean_mode")
.
marginal effect: average effect of gdp
across all countries. To get marginal (or average) predictions, we use predict_response(margin = "empirical")
.
While the term "effect" referes to the strength of the relationship between a predictor and the response, "predictions" refer to the actual predicted values of the response. Thus, in the following, we will talk about conditional and marginal (or average) predictions (instead of effects).
In a balanced data set, where all groups have the same number of observations, the conditional and marginal predictions are often similar (maybe slightly different, depending on the non-focal predictors). However, in unbalanced data, the conditional and marginal predicted values can largely differ.
library(ggeffects) library(lme4) data(sleepstudy) # balanced data set m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy) # conditional predictions predict_response(m, "Days [1,5,9]") # average marginal predictions predict_response(m, "Days [1,5,9]", margin = "empirical") # create imbalanced data set set.seed(123) strapped <- sleepstudy[sample.int(nrow(sleepstudy), nrow(sleepstudy), replace = TRUE), ] m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = strapped) # conditional predictions predict_response(m, "Days [1,5,9]") # average marginal predictions predict_response(m, "Days [1,5,9]", margin = "empirical")
REML
argumentThe conditional predictions returned by predict_response()
for the default marginalization (i.e. when margin = "mean_reference"
or "mean_mode"
) may differ...
REML
argument during model fitting;library(glmmTMB) set.seed(123) sleepstudy$x <- as.factor(sample(1:3, nrow(sleepstudy), replace = TRUE)) # REML is FALSE m1 <- glmmTMB(Reaction ~ Days + x + (1 + Days | Subject), data = sleepstudy, REML = FALSE) # REML is TRUE m2 <- glmmTMB(Reaction ~ Days + x + (1 + Days | Subject), data = sleepstudy, REML = TRUE) # predictions when REML is FALSE predict_response(m1, "Days [1:3]") # predictions when REML is TRUE predict_response(m2, "Days [1:3]")
For zero-inflated mixed effects models, typically fitted with the glmmTMB or GLMMadaptive packages, predict_response()
can return predicted values of the response, for the different model components:
Conditional predictions:
type = "fixed"
)type = "zero_inflated"
), returning the expected (predicted) values of the responsetype = "zi_prob"
)Marginal predictions:
type = "simulate"
can be used to obtain marginal predictions, averaged across all random effects groups and non-focal termsmargin = "empirical"
are also averaged across all random effects groups and non-focal terms. The major difference to type = "simulate"
is that margin = "empirical"
also returns counterfactual predictions.For predict_response(margin = "empirical")
, valid values for type
are usually those based on the model's predict()
method. For models of class glmmTMB
, these are "response"
, "link"
, "conditional"
, "zprob"
, "zlink"
, or "disp"
. However, for zero-inflated models, type = "fixed"
and type = "zero_inflated"
can be used as aliases (instead of "conditional"
or "response"
).
First, we show examples for conditional predictions, which is the default marginalization method in predict_response()
.
library(glmmTMB) data(Salamanders) m <- glmmTMB( count ~ spp + mined + (1 | site), ziformula = ~ spp + mined, family = poisson(), data = Salamanders )
Similar to mixed models without zero-inflation component, type = "fixed"
returns predictions on the population-level, but for the conditional ("count") model only.
predict_response(m, "spp")
For type = "zero_inflated"
, results the expected values of the response, mu*(1-p)
. Since the zero inflation and the conditional model are working in "opposite directions", a higher expected value for the zero inflation means a lower response, but a higher value for the conditional ("count") model means a higher response. While it is possible to calculate predicted values with predict(..., type = "response")
, standard errors and confidence intervals can not be derived directly from the predict()
-function. Thus, confidence intervals for type = "zero_inflated"
are based on quantiles of simulated draws from a multivariate normal distribution (see also Brooks et al. 2017, pp.391-392 for details).
predict_response(m, "spp", type = "zero_inflated")
In the above examples, we get the conditional, not the marginal predictions. E.g., predictions are conditioned on mined
when it is set to "yes"
, and predictions refer to a typical (random effects) group. However, it is possible to obtain predicted values by simulating from the model, where predictions are based on simulate()
(see Brooks et al. 2017, pp.392-393 for details). This will return expected values of the response (marginal predictions), averaged across all random effects groups and non-focal terms. To achieve this, use type = "simulate"
. Note that this prediction-type usually returns larger intervals, because it accounts for all model uncertainties.
predict_response(m, "spp", type = "simulate")
In a similar fashion, you can obtain average marginal predictions for zero-inflated mixed models with margin = "empirical"
. The returned values are most comparable to predict_response(type = "simulate")
, because margin = "empirical"
also returns expected values of the response, averaged across all random effects groups and all non-focal terms. The next example shows the average marginal predicted values of spp
on the response across all site
s, taking the zero-inflation component into account (i.e. type = "zero_inflated"
).
predict_response(m, "spp", type = "zero_inflated", margin = "empirical")
For non-Gaussian models, predicted values are back-transformed to the response scale. However, back-transforming the population-level predictions (in mixed models, when type = "fixed"
) ignores the effect of the variation around the population mean, hence, the result on the original data scale is biased due to Jensen's inequality. In this case, it can be appropriate to apply a bias-correction. This is done by setting bias_correction = TRUE
. By default, insight::get_variance_residual()
is used to extract the residual variance, which is used to calculate the amount of correction. Optionally, you can provide your own estimates of uncertainty, e.g. based on insight::get_variance_random()
, using the sigma
argument. ggeffects will warn users once per session whenever bias-correction can be appropriate.
options(ggeffects_warning_bias_correction = TRUE)
# no bias-correction predict_response(m, "spp") # bias-correction predict_response(m, "spp", bias_correction = TRUE) # bias-correction, using user-defined sigma-value predict_response(m, "spp", bias_correction = TRUE, sigma = insight::get_sigma(m))
Adjusted predictions can also be calculated for each group level (unit-level) in mixed models. Simply add the name of the related random effects term to the terms
-argument, and set type = "random"
. For predict_response(margin = "empirical")
, you don't need to set type = "random"
.
In the following example, we fit a linear mixed model and first simply plot the adjusted predictions at the population-level.
library(sjlabelled) data(efc) efc$e15relat <- as_label(efc$e15relat) m <- lmer(neg_c_7 ~ c12hour + c160age + c161sex + (1 | e15relat), data = efc) me <- predict_response(m, terms = "c12hour") plot(me)
To compute adjusted predictions for each grouping level (unit-level), add the related random effects term to the terms
-argument. In this case, predictions are calculated for each level of the specified random effects term.
me <- predict_response(m, terms = c("c12hour", "e15relat"), type = "random") plot(me, show_ci = FALSE)
Since average marginal predictions already consider random effects by averaging over the groups, the type
-argument is not needed when margin = "empirical"
is set.
me <- predict_response(m, terms = c("c12hour", "e15relat"), margin = "empirical") plot(me, show_ci = FALSE)
Adjusted predictions can also be calculated for specific unit-levels only. Add the related values into brackets after the variable name in the terms
-argument.
me <- predict_response(m, terms = c("c12hour", "e15relat [child,sibling]"), type = "random") plot(me, show_ci = FALSE)
The complex plot in this scenario would be a term (c12hour
) at certain values of two other terms (c161sex
, c160age
) for specific unit-levels of random effects (e15relat
), so we have four variables in the terms
-argument.
me <- predict_response( m, terms = c("c12hour", "c161sex", "c160age", "e15relat [child,sibling]"), type = "random" ) plot(me)
If the group factor has too many levels, you can also take a random sample of all possible levels and plot the adjusted predictions for this subsample of unit-levels. To do this, use term = "<groupfactor> [sample=n]"
.
set.seed(123) m <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy) me <- predict_response(m, terms = c("Days", "Subject [sample=7]"), type = "random") plot(me)
You can also add the observed data points for each group using show_data = TRUE
.
plot(me, show_data = TRUE, show_ci = FALSE)
gam
and glmer
modelsThe output of predict_response()
indicates that the grouping variable of the random effects is set to "population level" (adjustment), e.g. in case of lme4, following is printed:
Adjusted for: * Subject = 0 (population-level)
A comparable model fitted with mgcv::gam()
would print a different message:
Adjusted for: * Subject = 308
The reason is because the correctly printed information about adjustment for random effects is based on insight::find_random()
, which returns NULL
for gam
s with random effects defined via s(..., bs = "re")
. However, predictions are still correct, when population-level predictions are requested. Here's an example:
data("sleepstudy", package = "lme4") # mixed model with lme4 m_lmer <- lme4::lmer(Reaction ~ poly(Days, 2) + (1 | Subject), data = sleepstudy ) # equivalent model, random effects are defined via s(..., bs = "re") m_gam <- mgcv::gam(Reaction ~ poly(Days, 2) + s(Subject, bs = "re"), family = gaussian(), data = sleepstudy, method = "ML" ) # predictions are identical predict_response(m_gam, terms = "Days", exclude = "s(Subject)", newdata.guaranteed = TRUE) predict_response(m_lmer, terms = "Days")
Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378–400.
Heiss, A. (2022, November 29). Marginal and conditional effects for GLMMs with {marginaleffects}. Andrew Heiss’s Blog. (doi: 10.59350/xwnfm-x1827)
Johnson PC. 2014. Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946. (doi: 10.1111/2041-210X.12225)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.