Description Usage Arguments Details Value Author(s) References
Prediction performance for reconstructed stage-2 data using supervised machine learning with feature selection methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | BioMMstage2pred(
trainData,
testData,
resample = "CV",
dataMode,
repeatA = 1,
repeatB = 1,
nfolds,
FSmethod,
cutP,
fdr,
FScore = MulticoreParam(),
classifier,
predMode,
paramlist,
innerCore = MulticoreParam()
)
|
trainData |
The input training dataset (stage-2 data). The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member. |
testData |
The input test dataset (stage-2 data). The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member. |
resample |
The resampling methods. Valid options are 'CV' and 'BS'. 'CV' for cross validation and 'BS' for bootstrapping resampling. The default is 'CV'. |
dataMode |
The mode of data used. 'subTrain' or 'allTrain'. |
repeatA |
The number of repeats N is used during resampling prediction. The default is 1. |
repeatB |
The number of repeats N is used for test data prediction. The default is 1. |
nfolds |
The number of folds is defined for cross validation. |
FSmethod |
Feature selection methods. Available options are c(NULL, 'positive', 'wilcox.test', 'cor.test', 'chisq.test', 'posWilcox', or 'top10pCor'). |
cutP |
The cutoff used for p value thresholding. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc). The default is 0.05. |
fdr |
Multiple testing correction method. Available options are
c(NULL, 'fdr', 'BH', 'holm', etc).
See also |
FScore |
The number of cores used for feature selection if parallel computing needed. |
classifier |
Machine learning classifiers. |
predMode |
The prediction mode. Available options are c('probability', 'classification', 'regression'). |
paramlist |
A set of model parameters defined in an R list object. |
innerCore |
The number of cores used for computation. |
Stage-2 prediction is performed typically using positively correlated features. Since negative associations likely reflect random effects in the underlying data
The CV or BS predicted score for stage-2 training data and test set predicted score for stage-2 test data if the test set is given.
Junfang Chen
Perlich, C., & Swirszcz, G. (2011). On cross-validation and stacking: Building seemingly predictive models on random data. ACM SIGKDD Explorations Newsletter, 12(2), 11-15.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.