knitr::opts_chunk$set(collapse = TRUE, comment = "#>", dev = "png", fig.width = 7, fig.height = 4, message = FALSE, warning = FALSE) options(width = 800) if (!requireNamespace("ggplot2", quietly = TRUE)) { knitr::opts_chunk$set(eval = FALSE) }
Plotting partial residuals on top of the estimated marginal means allows detecting missed modeling, like unmodeled non-linear relationships or unmodeled interactions. In a nutshell, it allows Visualizing Fit and Lack of Fit in Complex Regression Models with Predictor Effect Plots and Partial Residuals (Fox & Weisberg 2018).
To add partial residuals to a plot, add show_residuals = TRUE
to the plot()
function call. Unlike plotting raw data, partial residuals are much better in detecting spurious patterns of relationships between predictors and outcome.
Let's look at an example with a non-linear relationship. The missed pattern is not obvious when looking at the raw data:
library(ggeffects) set.seed(1234) x <- rnorm(200) z <- rnorm(200) # quadratic relationship y <- 2 * x + x^2 + 4 * z + rnorm(200) d <- data.frame(x, y, z) m <- lm(y ~ x + z, data = d) pr <- predict_response(m, "x [all]") plot(pr, show_data = TRUE)
However, it becomes more obvious with partial residuals:
plot(pr, show_residuals = TRUE)
It is even more obvious, when a local polynomial regression line (loess) is added to the plot. This can be achieved using show_residuals_line = TRUE
.
plot(pr, show_residuals = TRUE, show_residuals_line = TRUE)
Here is another example, which shows that the partial residuals plot suggests modeling an interaction:
set.seed(1234) x <- rnorm(300, mean = 10) z <- rnorm(300) v <- rnorm(300) y <- (4 * z + 2) * x - 40 * z + 5 * v + rnorm(300, sd = 3) d <- data.frame(x, y, z) m <- lm(y ~ x + z, data = d) pr <- predict_response(m, c("x", "z")) # raw data, no interaction plot(pr, show_data = TRUE)
Again, it is recommended to add a loess-fit line to the residuals:
plot(pr, show_residuals = TRUE, grid = TRUE, show_residuals_line = TRUE)
Modeling the interaction clearly catches the pattern in the data better.
m <- lm(y ~ x * z, data = d) pr <- predict_response(m, c("x", "z")) plot(pr, show_residuals = TRUE, grid = TRUE, show_residuals_line = TRUE)
ggeffects usually "prettyfies" the data and tries to find a pretty sequence over a range of a focal predictor, to avoid too lengthy output, particularly for continuous variables (see section pretty value ranges in this vignette).
This, however, might be misleading in some cases when creating residual plots. In the next example, we have a sinus-curve pattern for the residuals, which is hidden by default:
set.seed(1234) x <- seq(-100, 100, length.out = 1e3) z <- rnorm(1e3) y <- 5 * sin(x / 2) + x / 2 + 10 * z m <- lm(y ~ x + z) pr <- predict_response(m, "x") plot(pr, show_residuals = TRUE)
In such cases, it is recommended to use the all
-tag in the terms
-argument.
pr <- predict_response(m, "x [all]") plot(pr, show_residuals = TRUE)
Fox J, Weisberg S. Visualizing Fit and Lack of Fit in Complex Regression Models with Predictor Effect Plots and Partial Residuals. Journal of Statistical Software 2018;87. https://www.jstatsoft.org/article/view/v087i09
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.