parallelPCA | R Documentation |
Perform Horn's parallel analysis to choose the number of principal components to retain.
parallelPCA(
mat,
max.rank = 100,
...,
niters = 50,
threshold = 0.1,
transposed = FALSE,
BSPARAM = ExactParam(),
BPPARAM = SerialParam()
)
mat |
A numeric matrix where rows correspond to variables and columns correspond to samples. |
max.rank |
Integer scalar specifying the maximum number of PCs to retain. |
... |
Further arguments to pass to |
niters |
Integer scalar specifying the number of iterations to use for the parallel analysis. |
threshold |
Numeric scalar representing the “p-value” threshold above which PCs are to be ignored. |
transposed |
Logical scalar indicating whether |
BSPARAM |
A BiocSingularParam object specifying the algorithm to use for PCA. |
BPPARAM |
A BiocParallelParam object specifying how the iterations should be paralellized. |
Horn's parallel analysis involves shuffling observations within each row of
x
to create a permuted matrix. PCA is performed on the permuted matrix
to obtain the percentage of variance explained under a random null hypothesis.
This is repeated over several iterations to obtain a distribution of curves on
the scree plot.
For each PC, the “p-value” (for want of a better word) is defined as the
proportion of iterations where the variance explained at that PC is greater
than that observed with the original matrix. The number of PCs to retain is
defined as the last PC where the p-value is below threshold
. This aims
to retain all PCs that explain “significantly” more variance than
expected by chance.
This function can be sped up by specifying BSPARAM=IrlbaParam()
or
similar, to use approximate strategies for performing the PCA. Another option
is to set BPPARAM
to perform the iterations in parallel.
A list is returned, containing:
original
, the output from running pca
on mat
with the specified arguments.
permuted
, a matrix of variance explained from randomly permuted matrices.
Each column corresponds to a single permutated matrix, while each row corresponds to successive principal components.
n
, the estimated number of principal components to retain.
Aaron Lun
# Mocking up some data.
ngenes <- 1000
means <- 2^runif(ngenes, 6, 10)
dispersions <- 10/means + 0.2
nsamples <- 50
counts <- matrix(rnbinom(ngenes*nsamples, mu=means,
size=1/dispersions), ncol=nsamples)
# Choosing the number of PCs
lcounts <- log2(counts + 1)
output <- parallelPCA(lcounts)
output$n
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.