chooseGavishDonoho: Choosing PCs with the Gavish-Donoho method
In kevinblighe/PCAtools: PCAtools: Everything Principal Components Analysis

chooseGavishDonoho

R Documentation

Choosing PCs with the Gavish-Donoho method

Description

Use the Gavish-Donoho method to determine the optimal number of PCs to retain.

Usage

chooseGavishDonoho(x, .dim = dim(x), var.explained, noise)

Arguments

`x`	The data matrix used for the PCA, containing variables in rows and observations in columns. Ignored if `dim` is supplied.
`.dim`	An integer vector containing the dimensions of the data matrix used for PCA. The first element should contain the number of variables and the second element should contain the number of observations.
`var.explained`	A numeric vector containing the variance explained by successive PCs. This should be sorted in decreasing order. Note that this should be the variance explained, NOT the percentage of variance explained!
`noise`	Numeric scalar specifying the variance of the random noise.

Details

Assuming that x is the sum of some low-rank truth and some i.i.d. random matrix with variance noise, the Gavish-Donoho method defines a threshold on the singular values that minimizes the reconstruction error from the PCs. This provides a mathematical definition of the “optimal” choice of the number of PCs for a given matrix, though it depends on both the i.i.d. assumption and an estimate for noise.

Value

An integer scalar specifying the number of PCs to retain. The effective limit on the variance explained is returned in the attributes.

Author(s)

Aaron Lun

Examples

truth <- matrix(rnorm(1000), nrow=100)
truth <- truth[,sample(ncol(truth), 1000, replace=TRUE)]
obs <- truth + rnorm(length(truth), sd=2)

# Note, we need the variance explained, NOT the percentage
# of variance explained! 
pcs <- pca(obs)
chooseGavishDonoho(obs, var.explained=pcs$sdev^2, noise=4)

kevinblighe/PCAtools documentation built on Oct. 22, 2023, 12:01 p.m.