#' @title Functions to flag/exclude models
#'
#' @description
#'
#' The functions listed on this help page (`flag_*`) allow to flag/identify
#' potentially problematic model fits. The function were designed to be used
#' with the models estimating an injection-order-dependent signal drift seen
#' in LC-MS based untargeted metabolomics data. Such models can be fit with the
#' `rowFitModel` function of the `xcms` package. Functions are expected to
#' return `TRUE` for potentially problematic model fits and `FALSE` otherwise.
#'
#' The functions are:
#'
#' - `flag_model_residual`: test whether the difference between the difference
#' between the 25 and 75% quantile of residuals is larger than the user
#' defined value `diff_residual`. This function identifies model fits with
#' on average large deviations of the individual data points from the fitted
#' line.
#'
#' - `flag_model_mean_residual`: tests if the `mean` of the absolute residuals
#' is larger than the provided value.
#'
#' - `flag_model_inj_range`: tests if values on which the model was fitted
#' spans a minimum required injection index range. This requires `x` being
#' a model of the type `y ~ inj_idx`.
#'
#' - `flag_model_cat_count`: tests if the number of replicated measurents
#' (categories) for a categorical variable (e.g. batch) are larger than
#' `min_count`.
#'
#' - `flag_model_coef_count`: tests if the number of estimated coefficients
#' matches the expected number. This is useful/required for linear models
#' aimed to adjust a batch effect, but for which a coefficient was not
#' estimated for each batch (=level of the categorical variable representing
#' the batch). This could happen if only missing values were present for the
#' respective batch.
#'
#' @param column for `flag_model_inj_range`: `character(1)` specifying the
#' column containing the injection index.
#'
#' @param cut_off for `flag_model_mean_residual`: `numeric(1)` defining the
#' cut-off to flag models with large average residuals.
#'
#' @param diff_residual for `flag_model_residual`: `numeric(1)` defining the
#' cut-off to flag models with large residuals. `TRUE` is reported for all
#' models with a difference between the 25 and 75 percent percentile of
#' residuals larger than this value.
#'
#' @param min_range for `flag_model_inj_range`: `numeric(1)` defining the
#' minimum range. This is an absolute value, not a percentage. Means, the
#' function compares the `diff(range(x$x[, column]))` with `min_range` and
#' flags models that don't fit that criteria (i.e. have a smaller range).
#'
#' @param min_count for `flag_model_cat_count`: `integer(1)` with the minimum
#' required number of values within each category.
#'
#' @param n_coef for `flag_model_coef_count`: `integer(1)` with the expected
#' number of coefficients.
#'
#' @param variable for `flag_model_cat_count`: `character(1)` with the name of
#' the categorical value. Should be one of `colnames(x$model)`.
#'
#' @param x a linear model object such as generated by [lm()] or
#' [robustbase::lmrob()].
#'
#' @return `logical(1)`: `TRUE` if model fit is potentially problematic and
#' `FALSE` otherwise or `NA` if no model provided.
#'
#' @author Johannes Rainer
#'
#' @rdname model-flagging
#'
#' @importFrom stats quantile residuals
#' @export
flag_model_residual <- function(x, diff_residual = 1) {
if (length(x) > 1) {
if (diff(quantile(residuals(x), probs = c(0.25, 0.75))) > diff_residual)
TRUE
else FALSE
} else NA
}
#' @rdname model-flagging
#'
#' @export
flag_model_mean_residual <- function(x, cut_off = 0.5) {
if (length(x) > 1)
mean(abs(residuals(x))) > cut_off
else NA
}
#' @rdname model-flagging
#'
#' @export
flag_model_inj_range <- function(x, min_range = 1, column = "inj_idx") {
if (length(x) > 1)
diff(range(x$model[, column])) < min_range
else NA
}
#' @rdname model-flagging
#'
#' @export
flag_model_cat_count <- function(x, variable, min_count = 4) {
if (missing(variable))
stop("'variable' is missing")
if (length(x) > 1) {
if (!any(colnames(x$model) == variable))
stop("'variable' has to be one of ", paste0(colnames(x$model),
collapse = ", "))
min(table(x$model[, variable])) < min_count
}
else NA
}
#' @rdname model-flagging
#'
#' @export
flag_model_coef_count <- function(x, n_coef) {
if (missing(n_coef))
stop("'n_coef' is missing")
if (length(x) > 1) {
length(x$coefficients) != n_coef
}
else NA
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.