spca | R Documentation |
Performs a sparse principal component analysis for variable selection using singular value decomposition and lasso penalisation on the loading vectors.
spca(
X,
ncomp = 2,
center = TRUE,
scale = TRUE,
keepX = rep(ncol(X), ncomp),
max.iter = 500,
tol = 1e-06,
logratio = c("none", "CLR"),
multilevel = NULL,
verbose.call = FALSE
)
X |
a numeric matrix (or data frame) which provides the data for the sparse principal components analysis. It should not contain missing values. |
ncomp |
Integer, if data is complete |
center |
(Default=TRUE) Logical, whether the variables should be shifted
to be zero centered. Only set to FALSE if data have already been centered.
Alternatively, a vector of length equal the number of columns of |
scale |
(Default=TRUE) Logical indicating whether the variables should be scaled to have unit variance before the analysis takes place. |
keepX |
numeric vector of length |
max.iter |
Integer, the maximum number of iterations in the NIPALS algorithm. |
tol |
Positive real, the tolerance used in the NIPALS algorithm. |
logratio |
one of ('none','CLR'). Specifies the log ratio transformation to deal with compositional values that may arise from specific normalisation in sequencing data. Default to 'none' |
multilevel |
sample information for multilevel decomposition for repeated measurements. |
verbose.call |
Logical (Default=FALSE), if set to TRUE then the |
scale= TRUE
is highly recommended as it will help obtaining orthogonal
sparse loading vectors.
keepX
is the number of variables to select in each loading vector,
i.e. the number of variables with non zero coefficient in each loading
vector.
Note that data can contain missing values only when logratio = 'none'
is used. In this case, center=TRUE
should be used to center the data
in order to effectively ignore the missing values. This is the default
behaviour in spca
.
According to Filzmoser et al., a ILR log ratio transformation is more appropriate for PCA with compositional data. Both CLR and ILR are valid.
Logratio transform and multilevel analysis are performed sequentially as
internal pre-processing step, through logratio.transfo
and
withinVariation
respectively.
Logratio can only be applied if the data do not contain any 0 value (for count data, we thus advise the normalise raw data with a 1 offset). For ILR transformation and additional offset might be needed.
The principal components are not guaranteed to be orthogonal in sPCA. We adopt the approach of Shen and Huang 2008 (Section 2.3) to estimate the explained variance in the case where the sparse loading vectors (and principal components) are not orthogonal. The data are projected onto the space spanned by the first loading vectors and the variance explained is then adjusted for potential correlation between PCs. Note that in practice, the loading vectors tend to be orthogonal if the data are centered and scaled in sPCA.
spca
returns a list with class "spca"
containing the
following components:
if verbose.call = FALSE
, then just the function call is returned.
If verbose.call = TRUE
then all the inputted values are accessable via
this component
the number of components to keep in the calculation.
the adjusted percentage of variance explained for each component.
the adjusted cumulative percentage of variances explained.
the number of variables kept in each loading vector.
the number of iterations needed to reach convergence for each component.
the matrix containing the sparse loading vectors.
the matrix containing the principal components.
Kim-Anh LĂȘ Cao, Fangzhou Yao, Leigh Coonan, Ignacio Gonzalez, Al J Abadi
Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034.
pca
and http://www.mixOmics.org for more details.
data(liver.toxicity)
spca.rat <- spca(liver.toxicity$gene, ncomp = 3, keepX = rep(50, 3))
spca.rat
## variable representation
plotVar(spca.rat, cex = 1)
## Not run:
plotVar(spca.rat,style="3d")
## End(Not run)
## samples representation
plotIndiv(spca.rat, ind.names = liver.toxicity$treatment[, 3],
group = as.numeric(liver.toxicity$treatment[, 3]))
## Not run:
plotIndiv(spca.rat, cex = 0.01,
col = as.numeric(liver.toxicity$treatment[, 3]),style="3d")
## End(Not run)
## example with multilevel decomposition and CLR log ratio transformation
data("diverse.16S")
spca.res = spca(X = diverse.16S$data.TSS, ncomp = 5,
logratio = 'CLR', multilevel = diverse.16S$sample)
plot(spca.res)
plotIndiv(spca.res, ind.names = FALSE, group = diverse.16S$bodysite, title = '16S diverse data',
legend=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.