knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dev = "png", fig.width = 7, fig.height = 3.5, message = FALSE, warning = FALSE ) options(width = 800) arrow_color <- "#FF00cc" pkgs <- c( "ggplot2", "marginaleffects", "parameters", "emmeans", "htmltools" ) if (!all(vapply(pkgs, requireNamespace, quietly = TRUE, FUN.VALUE = logical(1L)))) { knitr::opts_chunk$set(eval = FALSE) }
library(htmltools) callout_tip <- function(header = NULL, ...) { div( class = "callout-tip", tags$h1( tags$img(src = "../man/figures/summary.png", width = "20", height = "17", style = "vertical-align:middle"), # nolint header ), ... ) } includeCSS("../man/figures/callout.css")
This vignette is the second in a 4-part series:
Significance Testing of Differences Between Predictions I: Contrasts and Pairwise Comparisons
Significance Testing of Differences Between Predictions II: Comparisons of Slopes, Floodlight and Spotlight Analysis (Johnson-Neyman Intervals)
For numeric focal terms, it is possible to conduct hypothesis testing for slopes, or the linear trend of these focal terms.
callout_tip( "Summary of most important points:", tags$ul( tags$li("For categorical predictors (focal terms), it is easier to define which values to compare. For continuous predictors, however, you may want to compare different (meaningful) values of that predictor, or its slope (or even compare slopes of different continuous focal terms)."), # nolint tags$li("By default, when the first focal term is continuous, contrasts or comparisons are calculated for the slopes of that predictor."), # nolint tags$li("It is even more complicated to deal with interactions of (at least) two continuous predictors. In this case, the ", tags$code("johnson_neyman()"), " function can be used, which is a special case of pairwise comparisons for interactions with continuous predictors.") # nolint ) )
Let's start with a simple example again.
library(ggeffects) library(parameters) data(iris) m <- lm(Sepal.Width ~ Sepal.Length + Species, data = iris) model_parameters(m)
We can already see from the coefficient table that the slope for Sepal.Length
is r unname(round(coef(m)[2], 3))
. We will thus find the same increase for the predicted values in our outcome when our focal variable, Sepal.Length
increases by one unit.
predict_response(m, "Sepal.Length [4,5,6,7]")
Consequently, in this case of a simple slope, we see the same result for the hypothesis test for the linar trend of Sepal.Length
:
test_predictions(m, "Sepal.Length")
Sepal.Length
significant for the different levels of Species
?Let's move on to a more complex example with an interaction between a numeric and categorical variable.
m <- lm(Sepal.Width ~ Sepal.Length * Species, data = iris) pred <- predict_response(m, c("Sepal.Length", "Species")) plot(pred)
We can see that the slope of Sepal.Length
is different within each group of Species
.
library(ggplot2) p <- plot(pred) dat <- as.data.frame(pred, terms_to_colnames = FALSE) dat1 <- data.frame( x = dat$x[c(13, 28)], y = dat$predicted[c(13, 28)], group_col = "setosa", stringsAsFactors = FALSE ) dat2 <- data.frame( x = dat$x[c(8, 23)], y = dat$predicted[c(8, 23)], group_col = "versicolor", stringsAsFactors = FALSE ) dat3 <- data.frame( x = dat$x[c(39, 54)], y = dat$predicted[c(39, 54)], group_col = "versicolor", stringsAsFactors = FALSE ) p + geom_segment( data = dat1, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[1]), color = "orange", linewidth = 1, linetype = "dashed" ) + geom_segment( data = dat1, mapping = aes(x = x[2], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 1, linetype = "dashed" ) + geom_segment( data = dat1, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 1 ) + geom_segment( data = dat2, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[1]), color = "orange", linewidth = 1, linetype = "dashed" ) + geom_segment( data = dat2, mapping = aes(x = x[2], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 1, linetype = "dashed" ) + geom_segment( data = dat2, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 1 ) + geom_segment( data = dat3, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[1]), color = "orange", linewidth = 1, linetype = "dashed" ) + geom_segment( data = dat3, mapping = aes(x = x[2], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 1, linetype = "dashed" ) + geom_segment( data = dat3, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 1 ) + ggtitle("Linear trend of `Sepal.Length` by `Species`")
Since we don't want to do pairwise comparisons, we set test = "slope"
(or test = "trend"
). In this case, when interaction terms are included, the linear trend (slope) for our numeric focal predictor, Sepal.Length
, is tested for each level of Species
.
# test = "slope" is just an alias for test = NULL test_predictions(m, c("Sepal.Length", "Species"), test = "slope")
As we can see, each of the three slopes is significant, i.e. we have "significant" linear trends.
Next question could be whether or not linear trends differ significantly between each other, i.e. we test differences in slopes, which is a pairwise comparison between slopes. To do this, we use the default for test
, which is "pairwise"
.
test_predictions(m, c("Sepal.Length", "Species"))
ht5 <- test_predictions(m, c("Sepal.Length", "Species"))
The linear trend of Sepal.Length
within setosa
is significantly different from the linear trend of versicolor
and also from virginica
. The difference of slopes between virginica
and versicolor
is not statistically significant (p = r round(ht5$p.value[3], 3)
).
Sepal.Length
in between two groups of Species
significantly different from the difference of two linear trends between two other groups?Similar to the example for categorical predictors, we can also test a difference-in-differences for this example. For instance, is the difference of the slopes from Sepal.Length
between setosa
and versicolor
different from the slope-difference for the groups setosa
and vigninica
?
This difference-in-differences we're interested in is again indicated by the purple arrow in the below plot.
p <- plot(pred) dat <- as.data.frame(pred, terms_to_colnames = FALSE) dat1 <- data.frame( x = dat$x[c(10, 16)], y = dat$predicted[c(10, 16)], group_col = "setosa", stringsAsFactors = FALSE ) dat2 <- data.frame( x = dat$x[c(11, 17)], y = dat$predicted[c(11, 17)], group_col = "versicolor", stringsAsFactors = FALSE ) dat3 <- data.frame( x = dat$x[c(40, 46)], y = dat$predicted[c(40, 46)], group_col = "versicolor", stringsAsFactors = FALSE ) dat4 <- data.frame( x = dat$x[c(42, 48)], y = dat$predicted[c(42, 48)], group_col = "versicolor", stringsAsFactors = FALSE ) dat5 <- data.frame( x = dat$x[c(13, 14)], y = dat$predicted[c(13, 14)], group_col = "versicolor", stringsAsFactors = FALSE ) dat6 <- data.frame( x = dat$x[c(43, 45)], y = dat$predicted[c(43, 45)], group_col = "versicolor", stringsAsFactors = FALSE ) p + geom_segment( data = dat1, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 0.8 ) + geom_segment( data = dat2, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 0.8 ) + geom_segment( data = dat3, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 0.8 ) + geom_segment( data = dat4, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 0.8 ) + geom_segment( data = dat5, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 0.8, linetype ="dotted" ) + geom_segment( data = dat6, mapping = aes(x = x[1], xend = x[2], y = y[1], yend = y[2]), color = "orange", linewidth = 0.8, linetype ="dotted" ) + geom_segment( mapping = aes(x = 5, xend = 7, y = 3.3, yend = 3.3), color = arrow_color, linewidth = 0.6, arrow = arrow(length = unit(0.1, "inches"), ends = "both", angle = 40) ) + ggtitle("Difference-in-differences")
Let's look at the different slopes separately first, i.e. the slopes of Sepal.Length
by levels of Species
:
test_predictions(m, c("Sepal.Length", "Species"), test = NULL)
ht6 <- test_predictions(m, c("Sepal.Length", "Species"), test = NULL)
The first difference of slopes we're interested in is the one between setosa
(r round(ht6$Slope[1], 2)
) and versicolor
(r round(ht6$Slope[2], 2)
), i.e. b1 - b2
(=r round(ht6$Slope[1] - ht6$Slope[2], 2)
). The second difference is between levels setosa
(r round(ht6$Slope[1], 2)
) and virginica
(r round(ht6$Slope[3], 2)
), which is b1 - b3
(=r round(ht6$Slope[1] - ht6$Slope[3], 2)
). We test the null hypothesis that (b1 - b2) = (b1 - b3)
.
test_predictions(m, c("Sepal.Length", "Species"), test = "(b1 - b2) = (b1 - b3)")
ht7 <- test_predictions(m, c("Sepal.Length", "Species"), test = "(b1 - b2) = (b1 - b3)")
The difference between the two differences is r round(ht7$Contrast[1], 2)
and not statistically significant (p = r round(ht7$p.value[1], 3)
).
Sepal.Length
significant at different values of another numeric predictor?When we have two numeric terms in an interaction, the comparison becomes more difficult, because we have to find meaningful (or representative) values for the moderator, at which the associations between the predictor and outcome are tested. We no longer have distinct categories for the moderator variable.
The following examples show interactions between two numeric predictors. In case of interaction terms, adjusted predictions are usually shown at representative values. If a numeric variable is specified as second or third interaction term, representative values (see values_at()
) are typically mean +/- SD. This is sometimes also called "spotlight analysis" (Spiller et al. 2013).
In the next example, we have Petal.Width
as second interaction term, thus we see the predicted values of Sepal.Width
(our outcome) for Petal.Length
at three different, representative values of Petal.Width
: Mean (r round(mean(iris$Petal.Width), 2)
), 1 SD above the mean (r round(mean(iris$Petal.Width) + sd(iris$Petal.Width), 2)
) and 1 SD below the mean (r round(mean(iris$Petal.Width) - sd(iris$Petal.Width), 2)
).
m <- lm(Sepal.Width ~ Petal.Length * Petal.Width, data = iris) pred <- predict_response(m, c("Petal.Length", "Petal.Width")) plot(pred)
For test_predictions()
, these three values (mean, +1 SD and -1 SD) work in the same way as if Petal.Width
was a categorical predictor with three levels.
First, we want to see at which value of Petal.Width
the slopes of Petal.Length
are significant. We do no pairwise comparison here, hence we set test = "slope"
.
test_predictions(pred, test = "slope") # same as: # test_predictions(m, c("Petal.Length", "Petal.Width"), test = NULL)
The results of the pairwise comparison are shown below. These tell us that all linear trends (slopes) are significantly different from each other, i.e. the slope of the green line is significantly different from the slope of the red line, and so on.
test_predictions(pred)
Another way to handle models with two numeric variables in an interaction is to use so-called floodlight analysis, a spotlight analysis for all values of the moderator variable, which is implemented in the johnson_neyman()
function that creates Johnson-Neyman intervals. These intervals indicate the values of the moderator at which the slope of the predictor is significant (cf. Johnson et al. 1950, McCabe et al. 2018).
Let's look at an example. We first plot the predicted values of Income
for Murder
at nine different values of Illiteracy
(there are no more colors in the default palette to show more lines).
states <- as.data.frame(state.x77) states$HSGrad <- states$`HS Grad` m_mod <- lm(Income ~ HSGrad + Murder * Illiteracy, data = states) myfun <- seq(0.5, 3, length.out = 9) pr <- predict_response(m_mod, c("Murder", "Illiteracy [myfun]")) plot(pr)
It's difficult to say at which values from Illiteracy
, the association between Murder
and Income
might be statistically signifiant. We still can use test_predictions()
:
test_predictions(pr, test = NULL)
As can be seen, the results might indicate that at the lower and upper tails of Illiteracy
, i.e. when Illiteracy
is roughly smaller than 0.8
or larger than 2.6
, the association between Murder
and Income
is statistically signifiant.
However, this test can be simplified using the johnson_neyman()
function:
johnson_neyman(pr) #> The association between `Murder` and `Income` is positive for values of #> `Illiteracy` lower than 0.79 and negative for values higher than 2.67. #> Inside the interval of [0.79, 2.67], there were no clear associations.
Furthermore, it is possible to create a spotlight-plot.
plot(johnson_neyman(pr))
To avoid misleading interpretations of the plot, we speak of "positive" and "negative" associations, respectively, or "no clear" associations (instead of "significant" or "non-significant"). This should prevent considering a non-significant range of values of the moderator as "accepting the null hypothesis".
The results of the spotlight analysis suggest that values below 0.79
and above 2.67
are significantly different from zero, while values in between are not. We can plot predictions at these values to see the differences. The red and the green line represent values of Illiteracy
at which we find clear positive resp. negative associations between Murder
and Income
, while we find no clear (positive or negative) association for the blue line.
pr <- predict_response(m_mod, c("Murder", "Illiteracy [0.7, 1.5, 2.8]")) plot(pr, grid = TRUE)
Johnson, P.O. & Fay, L.C. (1950). The Johnson-Neyman technique, its theory and application. Psychometrika, 15, 349-367. doi: 10.1007/BF02288864
McCabe CJ, Kim DS, King KM. (2018). Improving Present Practices in the Visual Display of Interactions. Advances in Methods and Practices in Psychological Science, 1(2):147-165. doi:10.1177/2515245917746792
Spiller, S. A., Fitzsimons, G. J., Lynch, J. G., & McClelland, G. H. (2013). Spotlights, Floodlights, and the Magic Number Zero: Simple Effects Tests in Moderated Regression. Journal of Marketing Research, 50(2), 277–288. doi:10.1509/jmr.12.0420
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.