p_combine: P-value aggregation

View source: R/combine-p-values.R

p_combineR Documentation

P-value aggregation

Description

p_combine is used to combine the p-values of independent significance tests.

Usage

p_combine(p, method = c("fisher", "SL", "MG", "tippett"), w = NULL)

Arguments

p

vector of p-values

method

one of the following: Fisher (1932) ('fisher'), Stouffer (1949), Liptak (1958) ('SL'), Mudholkar and George (1979) ('MG'), and Tippett (1931) ('tippett')

w

weights, only used in combination with Stouffer-Liptak. If is.null(w) then weights are set in an unbiased way

Details

The problem can be specified as follows: Given a vector of n p-values p_1, ..., p_n, find p_c, the combined p-value of the n significance tests. Most of the methods introduced here combine the p-values in order to obtain a test statistic, which follows a known probability distribution. The general procedure can be stated as:

T(h, C) = \sum^n_{i = 1}{h(p_i)} * C

The function T, which returns the test statistic t, takes two arguments. h is a function defined on the interval [0, 1] that transforms the individual p-values, and C is a correction term.

Fisher's method (1932), also known as the inverse chi-square method is probably the most widely used method for combining p-values. Fisher used the fact that if p_i is uniformly distributed (which p-values are under the null hypothesis), then -2 \log{p_i} follows a chi-square distribution with two degrees of freedom. Therefore, if p-values are transformed as follows,

h(p) = -2 \log{p},

and the correction term C is neutral, i.e., equals 1, the following statement can be made about the sampling distribution of the test statistic T_f under the null hypothesis: t_f is distributed as chi-square with 2n degrees of freedom, where n is the number of p-values.

Stouffer's method, or the inverse normal method, uses a p-value transformation function h that leads to a test statistic that follows the standard normal distribution by transforming each p-value to its corresponding normal score. The correction term scales the sum of the normal scores by the root of the number of p-values.

h(p) = \Phi^{-1}(1 - p)

C = \frac{1}{\sqrt{n}}

Under the null hypothesis, t_s is distributed as standard normal. \Phi^{-1} is the inverse of the cumulative standard normal distribution function.

An extension of Stouffer's method with weighted p-values is called Liptak's method.

The logit method by Mudholkar and George uses the following transformation:

h(p) = -\ln(p / (1 - p))

When the sum of the transformed p-values is corrected in the following way:

C = \sqrt{\frac{3(5n + 4)}{\pi^2 n (5n + 2)}},

the test statistic t_m is approximately t-distributed with 5n + 4 degrees of freedom.

In Tippett's method the smallest p-value is used as the test statistic t_t and the combined significance is calculated as follows:

Pr(t_t) = 1 - (1 - t_t)^n

Value

A list with the following components:

statistic the test statistic
p_value the corresponding p-value
method the method used
statistic_name the name of the test statistic

Examples

p_combine(c(0.01, 0.05, 0.5))

p_combine(c(0.01, 0.05, 0.5), method = "tippett")

kkrismer/transite documentation built on July 13, 2024, 8:01 a.m.