View source: R/stability_selection.R
randLassoStabSel | R Documentation |
This function runs randomized lasso stability selection as
presented by Meinshausen and Bühlmann (2010) and with the improved
error bounds introduced by Shah and Samworth (2013). The function
uses the stabsel
function from the stabs
package, but implements the randomized lasso version.
randLassoStabSel(
x,
y,
weakness = 0.8,
cutoff = 0.8,
PFER = 2,
mc.cores = 1L,
...
)
x |
the predictor matrix. |
y |
the response vector. |
weakness |
value between 0 and 1 (default = 0.8). It affects how strict the method will be in selecting predictors. The closer it is to 0, the more stringent the selection. A weakness value of 1 is identical to performing lasso stability selection (not the randomized version). |
cutoff |
value between 0 and 1 (default = 0.8) which is the cutoff for the selection probability. Any variable with a selection probability that is higher than the set cutoff will be selected. |
PFER |
integer (default = 2) representing the absolute number of false positives that we allow for in the final list of selected variables. For details see Meinshausen and Bühlmann (2010). |
mc.cores |
integer (default = 1) specifying the number of cores to
use in |
... |
additional parameters that can be passed on to
|
Randomized lasso stability selection runs a randomized lasso
regression several times on subsamples of the response variable and
predictor matrix. N/2 elements from the response variable are randomly
chosen in each regression, where N is the length of the vector. The
corresponding section of the predictor matrix is also chosen, and the
internal .glmnetRandomizedLasso
function is applied.
Stability selection results in selection probabilities for each
predictor. The probability of a specific predictor is the number of
times it was selected divided by the total number of subsamples that
were done (total number of times the regression was performed).
We made use of the stabs
package that implements lasso stability
selection, and adapted it to run randomized lasso stability selection.
A SummarizedExperiment
object where the rows are the
observations and the columns the predictors (same dimnames as the
predictor matrix x
).
It contains:
:
: the predictor matrix.
: a DataFrame with columns:
: the response vector.
: a DataFrame with columns:
: the final selection probabilities for the predictors (from the last regularization step).
: logical indicating the predictors that made the selection with the specified cutoff.
: the normalized area under the seletion curve (mean of selection probabilities over regulatization steps).
i
': columns containing the selection probabilities for regularization step i.
: a list of output returned from
stabsel
and randLassoStabSel
:
: probability cutoff set for selection
of predictors (see stabsel
).
: elements with maximal selection
probability greater cutoff
(see stabsel
).
: maximum of selection probabilities
(see stabsel
).
: average number of selected variables
used (see stabsel
).
: (realized) upper bound for the
per-family error rate (see stabsel
).
: specified upper bound for
the per-family error rate (see stabsel
).
: the number of effects subject to
selection (see stabsel
).
: the number of subsamples (see
stabsel
).
: the sampling type used for
stability selection (see stabsel
).
: the assumptions made on the
selection probabilities (see stabsel
).
: stabsel
the call.
: the weakness parameter in the randomized lasso stability selection.
N. Meinshausen and P. Bühlmann (2010), Stability Selection,
Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 72, 417–73.
R.D. Shah and R.J. Samworth (2013), Variable Selection with Error
Control: Another Look at Stability Selection,
Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 75, 55–80.
B. Hofner, L. Boccuto, and M. Göker (2015), Controlling False
Discoveries in High-Dimensional Situations: Boosting with Stability
Selection, BMC Bioinformatics, 16 144.
stabsel
## create data set
Y <- rnorm(n = 500, mean = 2, sd = 1)
X <- matrix(data = NA, nrow = length(Y), ncol = 50)
for (i in seq_len(ncol(X))) {
X[ ,i] <- runif(n = 500, min = 0, max = 3)
}
s_cols <- sample(x = seq_len(ncol(X)), size = 10,
replace = FALSE)
for (i in seq_along(s_cols)) {
X[ ,s_cols[i]] <- X[ ,s_cols[i]] + Y
}
## reproducible randLassoStabSel() with 1 core
set.seed(123)
ss <- randLassoStabSel(x = X, y = Y)
## reproducible randLassoStabSel() in parallel mode
## (only works on non-windows machines)
if (.Platform$OS.type == "unix") {
RNGkind("L'Ecuyer-CMRG")
set.seed(123)
ss <- randLassoStabSel(x = X, y = Y, mc.preschedule = TRUE,
mc.set.seed = TRUE, mc.cores = 2L)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.