stab.fs.ranking: Function to quantify stability of feature ranking
In bhklab/genefu: Computation of Gene Expression-Based Signatures in Breast Cancer

stab.fs.ranking

R Documentation

Function to quantify stability of feature ranking

Description

This function computes several indexes to quantify feature ranking stability for several number of selected features. This is usually estimated through perturbation of the original dataset by generating multiple sets of selected features.

Usage

stab.fs.ranking(fsets, sizes, N, method = c("kuncheva", "davis"), ...)

Arguments

`fsets`	list or matrix of sets of selected features (in rows), each ranking must have the same size.
`sizes`	Number of top-ranked features for which the stability index must be computed.
`N`	total number of features on which feature selection is performed
`method`	stability index (see details section).
`...`	additional parameters passed to stability index (penalty that is a numeric for Davis' stability index, see details section).

Details

Stability indices may use different parameters. In this version only the Davis index requires an additional parameter that is penalty, a numeric value used as penalty term. Kuncheva index (kuncheva) lays in [-1, 1], An index of -1 means no intersection between sets of selected features, +1 means that all the same features are always selected and 0 is the expected stability of a random feature selection. Davis index (davis) lays in [0,1], With a penalty term equal to 0, an index of 0 means no intersection between sets of selected features and +1 means that all the same features are always selected. A penalty of 1 is usually used so that a feature selection performed with no or all features has a Davis stability index equals to 0. None estimate of the expected Davis stability index of a random feature selection was published.

Value

A vector of numeric that are stability indices for each size of the sets of selected features given the rankings.

References

Davis CA, Gerick F, Hintermair V, Friedel CC, Fundel K, Kuffner R, Zimmer R (2006) "Reliable gene signatures for microarray classification: assessment of stability and performance", Bioinformatics, 22(19):356-2363. Kuncheva LI (2007) "A stability index for feature selection", AIAP'07: Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference, pages 390-395.

Examples

# 100 random selection of 50 features from a set of 10,000 features
fsets <- lapply(as.list(1:100), function(x, size=50, N=10000) {
  return(sample(1:N, size, replace=FALSE))} )
names(fsets) <- paste("fsel", 1:length(fsets), sep=".")

# Kuncheva index
stab.fs.ranking(fsets=fsets, sizes=c(1, 10, 20, 30, 40, 50),
  N=10000, method="kuncheva")
# close to 0 as expected for a random feature selection

# Davis index
stab.fs.ranking(fsets=fsets, sizes=c(1, 10, 20, 30, 40, 50),
  N=10000, method="davis", penalty=1)

bhklab/genefu documentation built on Nov. 30, 2024, 9:03 p.m.