Description Usage Arguments Details Value Author(s) Examples
Identify and select a subset of outcome-associated or predictive features in the training data based on filtering methods. Return the same set of selected features for the test data if it is available.
1 2 3 4 5 6 7 8 | getDataByFilter(
trainData,
testData,
FSmethod,
cutP = 0.1,
fdr = NULL,
FScore = MulticoreParam()
)
|
trainData |
The input training dataset. The first column is the label. |
testData |
The input test dataset. The first column is the label. |
FSmethod |
Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', or 'top10pCor'). 'positive' is the positively outcome-associated features using the Pearson correlation method. 'posWilcox' is the positively outcome-associated features using Pearson correlation method together with 'wilcox.text' method. 'top10pCor' is the top 10 outcome-associcated features. This is helpful when no features can be picked during stringent feature selection procedure. |
cutP |
The cutoff used for p value thresholding. It can be any value between 0 and 1. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc.). The default is 0.1. |
fdr |
Multiple testing correction method. Available options are
c(NULL, 'fdr', 'BH', 'holm' etc).
See also |
FScore |
The number of cores used for some feature selection methods. If it's NULL, then no parallel computing is applied. |
Parallel computing is helpful if your input data is high dimensional. For 'cutP', a soft thresholding of 0.1 may be favorable than more stringent p value cutoff because the features with small effect size can be taken into consideration for downstream analysis. However, for high dimensional (e.g. p > 10,000) data, many false positive features may exist, thus, rigorous p value thresholding should be applied. The choice of feature selection method depends on the characteristics of the input data.
Both training and test data (if provided) with pre-selected features are returned if feature selection method is applied. If no feature can be selected during feature selection procedure, then the output is NULL.
Junfang Chen
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
## Load data
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')
methylData <- readRDS(methylfile)
trainIndex <- sample(nrow(methylData), 20)
trainData = methylData[trainIndex,]
testData = methylData[-trainIndex,]
## Feature selection
library(BiocParallel)
param <- MulticoreParam(workers = 10)
## Select outcome-associated features based on the Wilcoxon test (P<0.1)
datalist <- getDataByFilter(trainData, testData, FSmethod="wilcox.test",
cutP=0.1, fdr=NULL, FScore=param)
trainDataSub <- datalist[[1]]
testDataSub <- datalist[[2]]
print(dim(trainData))
print(dim(trainDataSub))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.