knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dev = "png", fig.width = 7, fig.height = 3.5, message = FALSE, warning = FALSE ) options(width = 800) arrow_color <- "#FF00cc" pkgs <- c( "ggplot2", "marginaleffects", "parameters", "emmeans", "htmltools" ) if (!all(vapply(pkgs, requireNamespace, quietly = TRUE, FUN.VALUE = logical(1L)))) { knitr::opts_chunk$set(eval = FALSE) }
library(htmltools) callout_tip <- function(header = NULL, ...) { div( class = "callout-tip", tags$h1( tags$img(src = "../man/figures/summary.png", width = "20", height = "17", style = "vertical-align:middle"), # nolint header ), ... ) } includeCSS("../man/figures/callout.css")
This vignette is the first in a 4-part series:
Significance Testing of Differences Between Predictions I: Contrasts and Pairwise Comparisons
A reason to compute adjusted predictions (or estimated marginal means) is to help understanding the relationship between predictors and outcome of a regression model. In particular for more complex models, for example, complex interaction terms, it is often easier to understand the associations when looking at adjusted predictions instead of the raw table of regression coefficients.
The next step, which often follows this, is to see if there are statistically significant differences. These could be, for example, differences between groups, i.e. between the levels of categorical predictors or whether trends differ significantly from each other.
The ggeffects package provides a function, test_predictions()
, which does exactly this: testing differences of adjusted predictions for statistical significance. This is usually called contrasts or (pairwise) comparisons, or marginal effects (if the difference refers to a one-unit change of predictors). This vignette shows some examples how to use the test_predictions()
function and how to test wheter differences in predictions are statistically significant.
First, different examples for pairwise comparisons are shown, later we will see how to test differences-in-differences (in the emmeans package, also called interaction contrasts).
Note: An alias for test_predictions()
is hypothesis_test()
. You can use either of these two function names.
callout_tip( "Summary of most important points:", tags$ul( tags$li("Contrasts and pairwise comparisons can be used to test if differences of predictions at different values of the focal terms are statistically significant or not. This is useful if \"group differences\" are of interest."), # nolint tags$li(tags$code("test_predictions()"), " calculates contrasts and comparisons for the results returned by ", tags$code("predict_response()"), "."), # nolint tags$li("It is possible to calculate simple contrasts, pairwise comparisons, or interaction contrasts (difference-in-differences). Use the ", tags$code("test"), " argument to control which kind of contrast or comparison should be made."), # nolint tags$li("If one or more focal terms are continuous predictors, contrasts and comparisons can be calculated for the slope, or the linear trend of those predictors.") # nolint ) )
episode
, do levels differ?We start with a toy example, where we have a linear model with two categorical predictors. No interaction is involved for now.
We display a simple table of regression coefficients, created with model_parameters()
from the parameters package.
library(ggeffects) library(parameters) library(ggplot2) set.seed(123) n <- 200 d <- data.frame( outcome = rnorm(n), grp = as.factor(sample(c("treatment", "control"), n, TRUE)), episode = as.factor(sample(1:3, n, TRUE)), sex = as.factor(sample(c("female", "male"), n, TRUE, prob = c(0.4, 0.6))) ) model1 <- lm(outcome ~ grp + episode, data = d) model_parameters(model1)
Let us look at the adjusted predictions.
my_predictions <- predict_response(model1, "episode") my_predictions plot(my_predictions)
We now see that, for instance, the predicted outcome when espisode = 2
is r round(my_predictions$predicted[2], 2)
.
We could now ask whether the predicted outcome for episode = 1
is significantly different from the predicted outcome at episode = 2
.
p <- plot(my_predictions) line_data <- as.data.frame(my_predictions, terms_to_colnames = FALSE)[1:2, ] p + geom_segment( data = line_data, aes( x = as.numeric(x[1]) + 0.06, xend = as.numeric(x[2]) - 0.06, y = predicted[1], yend = predicted[2], group = NULL, color = NULL ), color = arrow_color, arrow = arrow(length = unit(0.1, "inches"), ends = "both", angle = 40) ) + ggtitle("Within \"episode\", do levels 1 and 2 differ?") ht1 <- test_predictions(model1, "episode")
To do this, we use the test_predictions()
function. This function, like predict_response()
, either accepts the model object as first argument, followed by the focal predictors of interest, i.e. the variables of the model for which contrasts or pairwise comparisons should be calculated; or you can pass the results from predict_response()
directly into test_predictions()
. This is useful if you want to avoid specifying the same arguments again.
By default, when all focal terms are categorical, a pairwise comparison is performed. You can specify other hypothesis tests as well, using the test
argument (which defaults to "pairwise"
, see ?test_predictions
). For now, we go on with the simpler example of contrasts or pairwise comparisons.
# argument `test` defaults to "pairwise" test_predictions(model1, "episode") # same as test_predictions(my_predictions)
For our quantity of interest, the contrast between episode r ht1$episode[1]
, we see the value r round(ht1$Contrast[1], 2)
, which is exactly the difference between the predicted outcome for episode = 1
(r round(my_predictions$predicted[1], 2)
) and episode = 2
(r round(my_predictions$predicted[2], 2)
). The related p-value is r round(ht1$p.value[1], 3)
, indicating that the difference between the predicted values of our outcome at these two levels of the factor episode is indeed statistically significant.
Since the terms
argument in test_predictions()
works in the same way as for predict_response()
, you can directly pass "representative values" via that argument (for details, see this vignette). For example, we could also specify the levels of episode
directly, to simplify the output:
test_predictions(model1, "episode [1:2]")
In this simple example, the contrasts of both episode = 2
and episode = 3
to episode = 1
equals the coefficients of the regression table above (same applies to the p-values), where the coefficients refer to the difference between the related parameter of episode
and its reference level, episode = 1
.
To avoid specifying all arguments used in a call to predict_response()
again, we can also pass the objects returned by predict_response()
directly into test_predictions()
.
pred <- predict_response(model1, "episode") test_predictions(pred)
The next example includes a pairwise comparison of an interaction between two categorical predictors.
model2 <- lm(outcome ~ grp * episode, data = d) model_parameters(model2)
First, we look at the predicted values of outcome for all combinations of the involved interaction term.
my_predictions <- predict_response(model2, c("episode", "grp")) my_predictions plot(my_predictions)
We could now ask whether the predicted outcome for episode = 2
is significantly different depending on the level of grp
? In other words, do the groups treatment
and control
differ when episode = 2
?
p <- plot(my_predictions) line_data <- as.data.frame(my_predictions, terms_to_colnames = FALSE)[3:4, 1:2] line_data$group_col <- "control" p + geom_segment( data = line_data, aes( x = as.numeric(x[1]) - 0.06, xend = as.numeric(x[2]) + 0.06, y = predicted[1], yend = predicted[2], group = NULL, color = NULL ), color = arrow_color, arrow = arrow(length = unit(0.1, "inches"), ends = "both", angle = 40) ) + ggtitle("Within level 2 of \"episode\", do treatment and control group differ?") ht2 <- test_predictions(model2, c("episode", "grp"))
Again, to answer this question, we calculate all pairwise comparisons, i.e. the comparison (or test for differences) between all combinations of our focal predictors. The focal predictors we're interested here are our two variables used for the interaction.
# we want "episode = 2-2" and "grp = control-treatment" test_predictions(model2, c("episode", "grp"))
For our quantity of interest, the contrast between groups treatment
and control
when episode = 2
is r round(ht2$Contrast[8], 2)
. We find this comparison in row 8 of the above output.
As we can see, test_predictions()
returns pairwise comparisons of all possible combinations of factor levels from our focal variables. If we're only interested in a very specific comparison, we have two options to simplify the output:
We could directly formulate this comparison as test
. To achieve this, we first need to create an overview of the adjusted predictions, which we get from predict_response()
or test_predictions(test = NULL)
.
We pass specific values or levels to the terms
argument, which is the same as for predict_response()
.
# adjusted predictions, compact table test_predictions(model2, c("episode", "grp"), test = NULL)
In the above output, each row is considered as one coefficient of interest. Our groups we want to include in our comparison are rows two (grp = control
and episode = 2
) and five (grp = treatment
and episode = 2
), so our "quantities of interest" are b2
and b5
. Our null hypothesis we want to test is whether both predictions are equal, i.e. test = "b2 = b5"
. We can now calculate the desired comparison directly:
# compute specific contrast directly test_predictions(model2, c("episode", "grp"), test = "b2 = b5")
Curious how test
works in detail? test_predictions()
is a small, convenient wrapper around predictions()
and slopes()
of the great marginaleffects package. Thus, test
is just passed to the hypothesis
argument of those functions.
Again, using representative values for the terms
argument, we can also simplify the output using an alternative syntax:
# return pairwise comparisons for specific values, in # this case for episode = 2 in both groups test_predictions(model2, c("episode [2]", "grp"))
This is equivalent to the above example, where we directly specified the comparison we're interested in. However, the test
argument might provide more flexibility in case you want more complex comparisons. See examples below.
We can repeat the steps shown above to test any combination of group levels for differences.
For instance, we could now ask whether the predicted outcome for episode = 1
in the treatment
group is significantly different from the predicted outcome for episode = 3
in the control
group.
p <- plot(my_predictions) line_data <- as.data.frame(my_predictions, terms_to_colnames = FALSE)[c(2, 5), 1:2] line_data$group_col <- "treatment" p + geom_segment( data = line_data, aes( x = as.numeric(x[1]) + 0.06, xend = as.numeric(x[2]) - 0.06, y = predicted[1], yend = predicted[2], group = NULL, color = NULL ), color = arrow_color, arrow = arrow(length = unit(0.1, "inches"), ends = "both", angle = 40) ) + ggtitle("Do different episode levels differ between groups?") ht3 <- test_predictions(model2, c("episode", "grp"))
The contrast we are interested in is between episode = 1
in the treatment
group and episode = 3
in the control
group. These are the predicted values in rows three and four (c.f. above table of predicted values), thus we test
whether "b4 = b3"
.
test_predictions(model2, c("episode", "grp"), test = "b4 = b3")
Another way to produce this pairwise comparison, we can reduce the table of predicted values by providing specific values or levels in the terms
argument:
predict_response(model2, c("episode [1,3]", "grp"))
episode = 1
in the treatment
group and episode = 3
in the control
group refer now to rows two and three, thus we also can obtain the desired comparison this way:
pred <- predict_response(model2, c("episode [1,3]", "grp")) test_predictions(pred, test = "b3 = b2")
The test
argument also allows us to compare difference-in-differences (aka interaction contrasts). For example, is the difference between two episode levels in one group significantly different from the difference of the same two episode levels in the other group?
my_predictions <- predict_response(model2, c("grp", "episode")) p <- plot(my_predictions) line_data <- as.data.frame(my_predictions, terms_to_colnames = FALSE)[, 1:2, ] line_data$group_col <- "1" p + geom_segment( data = line_data, aes( x = as.numeric(x[1]) - 0.05, xend = as.numeric(x[1]) - 0.05, y = predicted[1], yend = predicted[2], group = NULL, color = NULL ), color = "orange", arrow = arrow(length = unit(0.1, "inches"), ends = "both", angle = 40, type = "closed") ) + geom_segment( data = line_data, aes( x = as.numeric(x[4]) - 0.05, xend = as.numeric(x[4]) - 0.05, y = predicted[4], yend = predicted[5], group = NULL, color = NULL ), color = "orange", arrow = arrow(length = unit(0.1, "inches"), ends = "both", angle = 40, type = "closed") ) + geom_segment( data = line_data, aes( x = as.numeric(x[1]) - 0.05, xend = as.numeric(x[4]) - 0.05, y = (predicted[1] + predicted[2]) / 2, yend = (predicted[4] + predicted[5]) / 2, group = NULL, color = NULL ), color = arrow_color, arrow = arrow(length = unit(0.1, "inches"), ends = "both", angle = 40) ) + ggtitle("Differnce-in-differences") ht4 <- test_predictions(model2, c("episode", "grp"), test = NULL) ht5 <- test_predictions(model2, c("episode", "grp"), test = "(b1 - b2) = (b4 - b5)")
As a reminder, we look at the table of predictions again:
test_predictions(model2, c("episode", "grp"), test = NULL)
The first difference of episode levels 1 and 2 in the control group refer to rows one and two in the above table (b1
and b2
). The difference for the same episode levels in the treatment group refer to the difference between rows four and five (b4
and b5
). Thus, we have b1 - b2
and b4 - b5
, and our null hypothesis is that these two differences are equal: test = "(b1 - b2) = (b4 - b5)"
.
test_predictions(model2, c("episode", "grp"), test = "(b1 - b2) = (b4 - b5)")
Interaction contrasts can also be calculated by specifying test = "interaction"
. Not that in this case, the emmeans package is used an backend, i.e. test_predictions()
is called with engine = "emmeans"
(silently).
# test = "interaction" always returns *all* possible interaction contrasts test_predictions(model2, c("episode", "grp"), test = "interaction")
Let's replicate this step-by-step:
episode = 1
in the control group is r round(ht4$Predicted[1], 2)
.episode = 2
in the control group is r round(ht4$Predicted[2], 2)
.r round(ht4$Predicted[1] - ht4$Predicted[2], 2)
episode = 1
in the treatment group is r round(ht4$Predicted[4], 2)
.episode = 2
in the treatment group is r round(ht4$Predicted[5], 2)
.r round(ht4$Predicted[4] - ht4$Predicted[5], 2)
r round((ht4$Predicted[1] - ht4$Predicted[2]) - (ht4$Predicted[4] - ht4$Predicted[5]), 2)
. This difference is not statistically significant (p = r round(ht5$p.value, 3)
).Thanks to the great marginaleffects package, it is now possible to have a powerful function in ggeffects that allows to perform the next logical step after calculating adjusted predictions and to conduct hypothesis tests for contrasts and pairwise comparisons.
While the current implementation in test_predictions()
already covers many common use cases for testing contrasts and pairwise comparison, there still might be the need for more sophisticated comparisons. In this case, I recommend using the marginaleffects package directly. Some further related recommended readings are the vignettes about Comparisons or Hypothesis Tests.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.