Description Usage Arguments Details Value Author(s) References See Also Examples
This function implements adaptive group-regularized (logistic) ridge regression by use of co-data. It uses co-data to improve predictions of binary and continuous response from high-dimension (e.g. genomics) data. Here, co-data is auxiliary information on variables (e.g. genes), such as annotation or p-values from other studies.
1 2 3 4 5 6 7 8 | grridge(highdimdata, response, partitions, unpenal = ~1,
offset=NULL, method="exactstable",
niter=10, monotone=NULL, optl=NULL, innfold=NULL,
fixedfoldsinn=TRUE, maxsel=c(25,100),selectionEN=FALSE,cvlmarg=1,
savepredobj="all", dataunpen=NULL, ord = 1:length(partitions),
comparelasso=FALSE,optllasso=NULL,cvllasso=TRUE,
compareunpenal=FALSE,trace=FALSE,modus=1,
EBlambda=FALSE,standardizeX = TRUE)
|
highdimdata |
Matrix or numerical data frame. Contains the primary data of the study. Columns are samples, rows are variables (features). |
response |
Factor, numeric, binary or survival. Response values. The number of response values should equal |
partitions |
List of lists. Each list component contains a partition of the variables, which is again a list. See details. |
unpenal |
Formula. Includes unpenalized variables. Set to |
offset |
Numeric (vector). Optional offset, either one constant or sample-specific, in which case |
method |
Character. Equal to |
niter |
Integer. Maximum number of re-penalization iterations. |
monotone |
Vector of booleans. If the jth component of |
optl |
Numeric. Value of the global regularization parameter (lambda). If specified, it skips optimization by cross-validation. |
innfold |
Integer. The fold for cross-validating the global regularization parameter lambda and for computing cross-validated likelihoods. Defaults too LOOCV. |
fixedfoldsinn |
Boolean. Use fixed folds for inner cross-validation? |
selectionEN |
Boolean. If |
maxsel |
Vector of integers. The maximum number of selected variables. Can be multiple to allow comparing models of various sizes. |
cvlmarg |
Numeric. Maximum margin (in percentage) that the cross-validated likelihood of the model with selected variables may deviate from the optimum one. |
savepredobj |
Character. If |
dataunpen |
Data frame. Optional data for unpenalized variables. |
ord |
Integer vector. The order in which the partitions in |
comparelasso |
Boolean. If |
optllasso |
Numeric. Value of the global regularization parameter (lambda) in the lasso. If specified, optimization by cross-validation is skipped. |
cvllasso |
Boolean. If |
compareunpenal |
Boolean. If |
trace |
Boolean. If |
modus |
Integer. Please use |
EBlambda |
Boolean. If |
standardizeX |
Boolean. If |
About partitions
: this is a list of partitions or one partition represented as a simple list.
Each partition is a (named) list that contains the indices (row numbers) of the variables in the concerning group. Such a partition is usually created by
CreatePartition
.
About savepredobj
: use savepredobj="all"
if you want to compare performances of the various predictors (e.g. ordinary ridge,
group-regularized ridge, group-regularized ridge + selection) using grridgeCV
.
About monotone
: We recommend to set the jth component of monotone
to TRUE
when the jth partition
is based on external p-values, test statistics or regression coeeficients. This increases stability of the predictions. If selectionEN=TRUE
, EN selection will, for all elements m of maxsel
, select exactly m or fewer variables. Note that EN is only used for selection;
the final predictive model is a group-ridge model fitted only on the selected variables using the penalties estimated by GRridge. Using multiple values for
maxsel
allows comparing models of various sizes, also in terms of cross-validated performance when using grridgeCV
.
About cvlmarg
: We recommended to use values between 0 and 2. A larger value will generally result in fewer selected variables by forward selection.
About innfold
: for large data sets considerable computing time may be saved when setting innfold=10
instead of default leave-one-out-cross-validation (LOOCV). About method
: "exactstable"
is recommended. If the number of variables is not very large, say <2000
, the faster non-iterative "exact"
method can be used as an alternative. grridge
uses the penalized
package to fit logistic and survival ridge models; glmnet
is used for linear response and for fitting lasso when comparelasso=TRUE
.
A list object containing:
true |
True values of the response |
cvfit |
Measure of fit. Cross-validated likelihoods from the iterations for linear and survival model; minus CV error for linear model |
lambdamults |
List of lists object containing the penalty multipliers per group per partition |
optl |
Global penalty parameter lambda |
lambdamultvec |
Vector with penalty multipliers per variable |
predobj |
List of prediction objects |
betas |
Estimated regression coefficients |
reslasso |
Results of the lasso. |
resEN |
Results of the Elastic Net selection for all elements of |
model |
Model used for fitting: logistic, linear or survival |
arguments |
Arguments used to call the function |
allpreds |
Predictions on the same data |
Mark A. van de Wiel
Mark van de Wiel, Tonje Lien, Wina Verlaat, Wessel van Wieringen, Saskia Wilting. (2016). Better prediction by use of co-data: adaptive group-regularized ridge regression. Statistics in Medicine, 35(3), 368-81.
Novianti PW, Snoek B, Wilting SM, van de Wiel MA (2017). Better diagnostic signatures from RNAseq data through use of auxiliary co-data. Bioinformatics, 33, 1572-1574.
Creating partitions: CreatePartition
;
Cross-validation for assessing predictive performance: grridgeCV
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ## NOTE:
## 1. EXAMPLE DEVIATES SOMEWHAT FROM THE EXAMPLE IN THE MANUSCRIPT IN ORDER TO SHOW SOME
## OTHER FUNCTIONALITIES.
## 2. HERE WE SHOW A SIMPLE EXAMPLE FROM THE FARKAS DATA SET
## MORE EXTENSIVE EXAMPLES OF FUNCTIONALITIES IN THE GRRIGDE PACKAGE ARE PROVIDED IN
## VIGNETTE DOCUMENTATION FILE
## 1ST EXAMPLE: Farkas DATA, USING ANNOTATION: DISTANCE TO CpG
##load data objects:
##datcenFarkas: methylation data for cervix samples (arcsine-transformed beta values)
##respFarkas: binary response (Normal and Precursor)
##CpGannFarkas: annotation of probes according to location
##(CpG-Island, North-Shelf, South-Shelf, North-Shore, South-Shore, Distant)
data(dataFarkas)
##Create list of partition(s), here only one partition included
partitionFarkas <- list(cpg=CreatePartition(CpGannFarkas))
##Group-regularized ridge applied to data datcenFarkas,
##response respFarkas and partition partitionFarkas.
##Saves the prediction objects from ordinary and group-regularized ridge.
##Includes unpenalized intercept by default.
#grFarkas <- grridge(datcenFarkas,respFarkas, optl=5.680087,
# partitionFarkas,monotone=FALSE)
## 2ND EXAMPLE: Verlaat DATA, USING P-VALUES AND SIGN OF EFFECT FROM FARKAS DATA
## see vignette documentation file!
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.