VarSelection: Variable Selection

View source: R/VarSelection.R

VarSelectionR Documentation

Variable Selection

Description

Function to do variable selection using a Regression Biplot methodology. This function calculates the regression biplot on the compromise matrix. Biplot can be understood as the decomposition of a target matrix ($Y=XB$). Here, $Y$ is the matrix containing all variables taken into account in the analisis,$X$ is the matrix containing the explaining variables, i.e., the coordinates of compromise matrix and finally, $B$ are the regression coefficients to be estimated. Then, the method is interpreted as a general linear regression into the $X$ matrix ($Y_hat=X(X'X)^(-1)X'Y$) and the matrix $X(X'X)^(-1)X'$ is the projection matrix onto the compromise configuration. We use a classical linear model to obtain the regressors coefficients, however the model could be extended and alternatives methods are able to use. The quality of the regression biplot is measured using the proportion of explained variance by each regression (adjusted r squared coefficient).

Usage

VarSelection(
  x,
  Data,
  intercept = FALSE,
  model = "LM",
  Crit = "Rsquare",
  perc = 0.9,
  nDims = 2,
  Normalize = FALSE
)

Arguments

x

is an object of DistStatis Class.

Data

should be a list of data.frame or ExpressionSet data with the same length of the number of tables to be integrate. In each dataframe, the Observations (common elements on Statis) should be in rows and the variables should be in columns. Data are the same data used to obtained the compromise configuration.It also can be a MultissayExperiment object, please check help of LinkData function and the package vignette.

intercept

Logical. If is TRUE, the models with intercept are computed, else the intercept is zero.

model

character. 'LM' for classical lm model. We've planned to implemening alternative models in the future.

Crit

Character indicating the variable selection criteria.You could chose 'Rsquare' or 'p-val'.

perc

The value of percentil that indicate how much data than are selected.

nDims

Numeric that indicates the number of dimensions to use for do the model. Default is 2.

Normalize

Logical. If is TRUE, the response variable in each model is normalized.

Value

a

VarSelection

VarSelection class with the corresponding completed slots according to the given model. Note that if a variable is chosen from several tables, it will appear as Var.1, Var.2, etc.

Author(s)

Laura M Zingatetti

References

  1. Gabriel, K. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3), 453–467.

  2. Gower, J. & Hand, D. (1996). Biplots, Monographs on statistics and applied probability. 54. London: Chapman and Hall., 277 pp.

  3. Greenacre, M. J. (2010). Biplots in practice. Fundacion BBVA.

Examples

{
data(Taraoceans)
pro.phylo <- Taraoceans$taxonomy[ ,'Phylum']
TaraOc<-list(Taraoceans$phychem,as.data.frame(Taraoceans$pro.phylo),
as.data.frame(Taraoceans$pro.NOGs))
TaraOc_1<-scale(TaraOc[[1]])
Normalization<-lapply(list(TaraOc[[2]],TaraOc[[3]]),
function(x){DataProcessing(x,Method='Compositional')})
colnames(Normalization[[1]])=pro.phylo
colnames(Normalization[[2]])=Taraoceans$GO
TaraOc<-list(TaraOc_1,Normalization[[1]],Normalization[[2]])
names(TaraOc)<-c('phychem','pro_phylo','pro_NOGs')
TaraOc<-lapply(TaraOc,as.data.frame)
Output<-LinkData(TaraOc,Scale =FALSE,
Distance = c('ScalarProduct','Euclidean','Euclidean'))
Selection<-VarSelection(Output,TaraOc,Crit='Rsquare',perc=0.95)
}

lauzingaretti/LinkHD documentation built on March 7, 2023, 9:21 a.m.