grridge: Group-regularized (logistic) ridge regression
In markvdwiel/GRridge: Better prediction by use of co-data: Adaptive group-regularized ridge regression

Description Usage Arguments Details Value Author(s) References See Also Examples

This function implements adaptive group-regularized (logistic) ridge regression by use of co-data. It uses co-data to improve predictions of binary and continuous response from high-dimension (e.g. genomics) data. Here, co-data is auxiliary information on variables (e.g. genes), such as annotation or p-values from other studies.

grridge(highdimdata, response, partitions, unpenal = ~1, 
        offset=NULL, method="exactstable",
        niter=10, monotone=NULL, optl=NULL, innfold=NULL, 
        fixedfoldsinn=TRUE, maxsel=c(25,100),selectionEN=FALSE,cvlmarg=1,
        savepredobj="all", dataunpen=NULL, ord = 1:length(partitions),
        comparelasso=FALSE,optllasso=NULL,cvllasso=TRUE,
        compareunpenal=FALSE,trace=FALSE,modus=1,
        EBlambda=FALSE,standardizeX = TRUE)

`highdimdata`	Matrix or numerical data frame. Contains the primary data of the study. Columns are samples, rows are variables (features).
`response`	Factor, numeric, binary or survival. Response values. The number of response values should equal `ncol(highdimdata)`.
`partitions`	List of lists. Each list component contains a partition of the variables, which is again a list. See details.
`unpenal`	Formula. Includes unpenalized variables. Set to `unpenal = ~0` if an intercept is not desired.
`offset`	Numeric (vector). Optional offset, either one constant or sample-specific, in which case `length(offset)=ncol(highdimdata)`
`method`	Character. Equal to `"exactstable"`: the stable iterative, systems-based method, `"stable"`: the iterative non-systems-based method, `"exact"`: the non-iterative, systems-based method, `"adaptridge"`: adaptive ridge (not recommended).
`niter`	Integer. Maximum number of re-penalization iterations.
`monotone`	Vector of booleans. If the jth component of `monotone` equals `TRUE`, then the group-penalties are forced to be monotone. If `monotone=NULL` monotony is not imposed for any partition.
`optl`	Numeric. Value of the global regularization parameter (lambda). If specified, it skips optimization by cross-validation.
`innfold`	Integer. The fold for cross-validating the global regularization parameter lambda and for computing cross-validated likelihoods. Defaults too LOOCV.
`fixedfoldsinn`	Boolean. Use fixed folds for inner cross-validation?
`selectionEN`	Boolean. If `selectionEN=TRUE` then post-hoc variable selection by weighted elastic net is performed.
`maxsel`	Vector of integers. The maximum number of selected variables. Can be multiple to allow comparing models of various sizes.
`cvlmarg`	Numeric. Maximum margin (in percentage) that the cross-validated likelihood of the model with selected variables may deviate from the optimum one.
`savepredobj`	Character. If `savepredobj="last"`, only the last penalized prediction object is saved; if `savepredobj="all"` all are saved; if `savepredobj="none"`, none are saved.
`dataunpen`	Data frame. Optional data for unpenalized variables.
`ord`	Integer vector. The order in which the partitions in `partitions` are used.
`comparelasso`	Boolean. If `comparelasso=TRUE` the results of lasso regression are included.
`optllasso`	Numeric. Value of the global regularization parameter (lambda) in the lasso. If specified, optimization by cross-validation is skipped.
`cvllasso`	Boolean. If `cvllasso=TRUE` it returns the cross-validated likelihood for lasso when `comparelasso=TRUE`.
`compareunpenal`	Boolean. If `compareunpenal=TRUE` the results of regression with unpenalized covariates only are included. Only relevant when `dataunpenal` is specified.
`trace`	Boolean. If `trace=TRUE` the results of the cross-validation for parameter (lambda) tuning are shown.
`modus`	Integer. Please use `modus=1`. Only use `modus=2` when backward compatibility with versions <= 1.6 is desired.
`EBlambda`	Boolean. If `EBlambda=TRUE` global lambda is estimated by empirical Bayes (currently only available for linear model).
`standardizeX`	Boolean. If `standardizeX=TRUE` variables in X are standardized prior to the analysis.

About partitions: this is a list of partitions or one partition represented as a simple list. Each partition is a (named) list that contains the indices (row numbers) of the variables in the concerning group. Such a partition is usually created by CreatePartition. About savepredobj: use savepredobj="all" if you want to compare performances of the various predictors (e.g. ordinary ridge, group-regularized ridge, group-regularized ridge + selection) using grridgeCV. About monotone: We recommend to set the jth component of monotone to TRUE when the jth partition is based on external p-values, test statistics or regression coeeficients. This increases stability of the predictions. If selectionEN=TRUE, EN selection will, for all elements m of maxsel, select exactly m or fewer variables. Note that EN is only used for selection; the final predictive model is a group-ridge model fitted only on the selected variables using the penalties estimated by GRridge. Using multiple values for maxsel allows comparing models of various sizes, also in terms of cross-validated performance when using grridgeCV. About cvlmarg: We recommended to use values between 0 and 2. A larger value will generally result in fewer selected variables by forward selection. About innfold: for large data sets considerable computing time may be saved when setting innfold=10 instead of default leave-one-out-cross-validation (LOOCV). About method: "exactstable" is recommended. If the number of variables is not very large, say <2000, the faster non-iterative "exact" method can be used as an alternative. grridge uses the penalized package to fit logistic and survival ridge models; glmnet is used for linear response and for fitting lasso when comparelasso=TRUE.

A list object containing:

`true`	True values of the response
`cvfit`	Measure of fit. Cross-validated likelihoods from the iterations for linear and survival model; minus CV error for linear model
`lambdamults`	List of lists object containing the penalty multipliers per group per partition
`optl`	Global penalty parameter lambda
`lambdamultvec`	Vector with penalty multipliers per variable
`predobj`	List of prediction objects
`betas`	Estimated regression coefficients
`reslasso`	Results of the lasso. `NULL` when `comparelasso=FALSE`
`resEN`	Results of the Elastic Net selection for all elements of `maxsel`. `list()` when `selectionEN=FALSE`
`model`	Model used for fitting: logistic, linear or survival
`arguments`	Arguments used to call the function
`allpreds`	Predictions on the same data

Mark A. van de Wiel

Mark van de Wiel, Tonje Lien, Wina Verlaat, Wessel van Wieringen, Saskia Wilting. (2016). Better prediction by use of co-data: adaptive group-regularized ridge regression. Statistics in Medicine, 35(3), 368-81.

Novianti PW, Snoek B, Wilting SM, van de Wiel MA (2017). Better diagnostic signatures from RNAseq data through use of auxiliary co-data. Bioinformatics, 33, 1572-1574.

Creating partitions: CreatePartition; Cross-validation for assessing predictive performance: grridgeCV.

## NOTE: 
## 1. EXAMPLE DEVIATES SOMEWHAT FROM THE EXAMPLE IN THE MANUSCRIPT IN ORDER TO SHOW SOME
##    OTHER FUNCTIONALITIES.
## 2. HERE WE SHOW A SIMPLE EXAMPLE FROM THE FARKAS DATA SET 
## MORE EXTENSIVE EXAMPLES OF FUNCTIONALITIES IN THE GRRIGDE PACKAGE ARE PROVIDED IN 
## VIGNETTE DOCUMENTATION FILE


## 1ST EXAMPLE: Farkas DATA, USING ANNOTATION: DISTANCE TO CpG

##load data objects:
##datcenFarkas: methylation data for cervix samples (arcsine-transformed beta values)
##respFarkas: binary response (Normal and Precursor)
##CpGannFarkas: annotation of probes according to location
##(CpG-Island, North-Shelf, South-Shelf, North-Shore, South-Shore, Distant) 
data(dataFarkas)

##Create list of partition(s), here only one partition included
partitionFarkas <- list(cpg=CreatePartition(CpGannFarkas))

##Group-regularized ridge applied to data datcenFarkas, 
##response respFarkas and partition partitionFarkas. 
##Saves the prediction objects from ordinary and group-regularized ridge.
##Includes unpenalized intercept by default.

#grFarkas <- grridge(datcenFarkas,respFarkas, optl=5.680087,
#                      partitionFarkas,monotone=FALSE)

## 2ND EXAMPLE: Verlaat DATA, USING P-VALUES AND SIGN OF EFFECT FROM FARKAS DATA
## see vignette documentation file!