rcc | R Documentation |
The function performs the regularized extension of the Canonical Correlation Analysis to seek correlations between two data matrices.
rcc(
X,
Y,
ncomp = 2,
method = c("ridge", "shrinkage"),
lambda1 = 0,
lambda2 = 0,
verbose.call = FALSE
)
X |
numeric matrix or data frame |
Y |
numeric matrix or data frame |
ncomp |
the number of components to include in the model. Default to 2. |
method |
One of "ridge" or "shrinkage". If "ridge", |
lambda1 , lambda2 |
a non-negative real. The regularization parameter for
the X and Y data. Defaults to |
verbose.call |
Logical (Default=FALSE), if set to TRUE then the |
The main purpose of Canonical Correlations Analysis (CCA) is the exploration
of sample correlations between two sets of variables X
and Y
observed on the same individuals (experimental units) whose roles in the
analysis are strictly symmetric.
The cancor
function performs the core of computations but additional
tools are required to deal with data sets highly correlated (nearly
collinear), data sets with more variables than units by example.
The rcc
function, the regularized version of CCA, is one way to deal
with this problem by including a regularization step in the computations of
CCA. Such a regularization in this context was first proposed by Vinod
(1976), then developped by Leurgans et al. (1993). It consists in the
regularization of the empirical covariances matrices of X
and Y
by adding a multiple of the matrix identity, that is, Cov(X)+ \lambda_1
I
and Cov(Y)+ \lambda_2 I
.
When lambda1=0
and lambda2=0
, rcc
performs a classical
CCA, if possible (i.e. when n > p+q
.
The shrinkage estimates method = "shrinkage"
can be used to bypass
tune.rcc
to choose the shrinkage parameters - which can be
long and costly to compute with very large data sets. Note that both
functions tune.rcc
(which uses cross-validation) and the
shrinkage parameters (which uses the formula from Schafer and Strimmer, see the corpcor package estimate.lambda
) may
output different results.
Note: when method = "shrinkage"
the parameters are estimated using estimate.lambda
from the corpcor package. Data are then centered to calculate
the regularised variance-covariance matrices in rcc
.
Missing values are handled in the function, except when using method = "shrinkage"
.
In that case the estimation of the missing values can be performed by the reconstitution
of the data matrix using the nipals
function.
rcc
returns a object of class "rcc"
, a list that
contains the following components:
X |
the original |
Y |
the original |
cor |
a vector containing the canonical correlations. |
lambda |
a vector containing the regularization parameters whether those were input if ridge method or directly estimated with the shrinkage method. |
loadings |
list
containing the estimated coefficients used to calculate the canonical
variates in |
variates |
list containing the canonical variates. |
names |
list containing the names to be used for individuals and variables. |
prop_expl_var |
Proportion of the explained variance of derived components, after setting possible missing values to zero. |
call |
if |
Sébastien Déjean, Ignacio González, Francois Bartolo, Kim-Anh Lê Cao, Florian Rohart, Al J Abadi
González, I., Déjean, S., Martin, P. G., and Baccini, A. (2008). CCA: An R package to extend canonical correlation analysis. Journal of Statistical Software, 23(12), 1-14.
González, I., Déjean, S., Martin, P., Goncalves, O., Besse, P., and Baccini, A. (2009). Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis. Journal of Biological Systems, 17(02), 173-199.
Leurgans, S. E., Moyeed, R. A. and Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society. Series B 55, 725-740.
Vinod, H. D. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics 6, 129-137.
Opgen-Rhein, R., and K. Strimmer. 2007. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Statist. emphAppl. Genet. Mol. Biol. 6:9. (http://www.bepress.com/sagmb/vol6/iss1/art9/)
Sch"afer, J., and K. Strimmer. 2005. A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. emphAppl. Genet. Mol. Biol. 4:32. (http://www.bepress.com/sagmb/vol4/iss1/art32/)
summary
, tune.rcc
,
plot.rcc
, plotIndiv
, plotVar
,
cim
, network
and http://www.mixOmics.org for
more details.
## Classic CCA
data(linnerud)
X <- linnerud$exercise
Y <- linnerud$physiological
linn.res <- rcc(X, Y)
## Not run:
## Regularized CCA
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene
nutri.res1 <- rcc(X, Y, ncomp = 3, lambda1 = 0.064, lambda2 = 0.008)
## using shrinkage parameters
nutri.res2 <- rcc(X, Y, ncomp = 3, method = 'shrinkage')
nutri.res2$lambda # the shrinkage parameters
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.