Description Usage Arguments Value Examples
View source: R/TADrandomForest.R
A wrapper function passed to caret::train
to apply a random forest
classification algorithm built and tested on user-defined binned domain
data from createTADdata
.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
trainData |
Data frame, the binned data matrix to built a random forest
classifiers (can be obtained using |
testData |
Data frame, the binned data matrix to test random forest
classifiers (can be obtained using |
tuneParams |
List, providing |
cvFolds |
Numeric, number of k-fold cross-validation to perform in order to tune the hyperparameters. Required. |
cvMetric |
Character, performance metric to use to choose optimal tuning parameters (one of either "Kappa", "Accuracy", "MCC", "ROC", "Sens", "Spec", "Pos Pred Value", "Neg Pred Value"). Default is "Accuracy". |
verbose |
Logical, controls whether or not details regarding modeling should be printed out. Default is TRUE. |
model |
Logical, whether to keep the model object. Default is TRUE. |
importances |
Logical, whether to extract variable importances. Default is TRUE. |
impMeasure |
Character, indicates the variable importance measure to use (one of either "MDA" (mean decrease in accuracy) or "MDG" (mean decrease in gini)). Ignored if importances = FALSE. |
performances |
Logical, indicates whether various performance metrics should be extracted when validating the model on the test data. Ignored if testData = NULL. |
A list containing: 1) a train object from caret
with model
information, 2) a data.frame of variable importance for each feature
included in the model, and 3) a data.frame of various performance metrics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | # Read in ARROWHEAD-called TADs at 5kb
data(arrowhead_gm12878_5kb)
# Extract unique boundaries
bounds.GR <- extractBoundaries(domains.mat = arrowhead_gm12878_5kb,
filter = FALSE,
CHR = c("CHR21", "CHR22"),
resolution = 5000)
# Read in GRangesList of 26 TFBS
data(tfbsList)
# Create the binned data matrix for CHR1 (training) and CHR22 (testing)
# using 5 kb binning, distance-type predictors from 26 different TFBS from
# the GM12878 cell line, and random under-sampling
tadData <- createTADdata(bounds.GR = bounds.GR,
resolution = 5000,
genomicElements.GR = tfbsList,
featureType = "distance",
resampling = "rus",
trainCHR = "CHR21",
predictCHR = "CHR22")
# Perform random forest using TADrandomForest by tuning mtry over 10 values
# using 3-fold CV
tadModel <- TADrandomForest(trainData = tadData[[1]],
testData = tadData[[2]],
tuneParams = list(mtry = c(2,5,8,10,13,16,18,21,24,26),
ntree = 500,
nodesize = 1),
cvFolds = 3,
cvMetric = "Accuracy",
verbose = TRUE,
model = TRUE,
importances = TRUE,
impMeasure = "MDA",
performances = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.