TADrandomForest: A wrapper function passed to 'caret::train' to apply a random...
In stilianoudakis/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

Description Usage Arguments Value Examples

A wrapper function passed to caret::train to apply a random forest classification algorithm built and tested on user-defined binned domain data from createTADdata.

TADrandomForest(
  trainData,
  testData = NULL,
  tuneParams = list(mtry = ceiling(sqrt(ncol(trainData) - 1)), ntree = 500, nodesize =
    1),
  cvFolds = 3,
  cvMetric = "Accuracy",
  verbose = FALSE,
  model = TRUE,
  importances = TRUE,
  impMeasure = "MDA",
  performances = FALSE
)

`trainData`	Data frame, the binned data matrix to built a random forest classifiers (can be obtained using `createTADdata`). Required.
`testData`	Data frame, the binned data matrix to test random forest classifiers (can be obtained using `createTADdata`). The first column must be a factor with positive class "Yes". Default is NULL in which case no performances are evaluated.
`tuneParams`	List, providing `mtry`, `ntree`, and `nodesize` parameters to feed into `randomForest`. Default is list(mtry = ceiling(sqrt(ncol(trainData) - 1)), ntree = 500, nodesize = 1). If multiple values are provided, then a grid search is performed to tune the model. Required.
`cvFolds`	Numeric, number of k-fold cross-validation to perform in order to tune the hyperparameters. Required.
`cvMetric`	Character, performance metric to use to choose optimal tuning parameters (one of either "Kappa", "Accuracy", "MCC", "ROC", "Sens", "Spec", "Pos Pred Value", "Neg Pred Value"). Default is "Accuracy".
`verbose`	Logical, controls whether or not details regarding modeling should be printed out. Default is TRUE.
`model`	Logical, whether to keep the model object. Default is TRUE.
`importances`	Logical, whether to extract variable importances. Default is TRUE.
`impMeasure`	Character, indicates the variable importance measure to use (one of either "MDA" (mean decrease in accuracy) or "MDG" (mean decrease in gini)). Ignored if importances = FALSE.
`performances`	Logical, indicates whether various performance metrics should be extracted when validating the model on the test data. Ignored if testData = NULL.

A list containing: 1) a train object from caret with model information, 2) a data.frame of variable importance for each feature included in the model, and 3) a data.frame of various performance metrics

# Read in ARROWHEAD-called TADs at 5kb
data(arrowhead_gm12878_5kb)

# Extract unique boundaries
bounds.GR <- extractBoundaries(domains.mat = arrowhead_gm12878_5kb,
                               filter = FALSE,
                               CHR = c("CHR21", "CHR22"),
                               resolution = 5000)

# Read in GRangesList of 26 TFBS
data(tfbsList)

# Create the binned data matrix for CHR1 (training) and CHR22 (testing)
# using 5 kb binning, distance-type predictors from 26 different TFBS from
# the GM12878 cell line, and random under-sampling
tadData <- createTADdata(bounds.GR = bounds.GR,
                         resolution = 5000,
                         genomicElements.GR = tfbsList,
                         featureType = "distance",
                         resampling = "rus",
                         trainCHR = "CHR21",
                         predictCHR = "CHR22")

# Perform random forest using TADrandomForest by tuning mtry over 10 values
# using 3-fold CV
tadModel <- TADrandomForest(trainData = tadData[[1]],
                            testData = tadData[[2]],
                            tuneParams = list(mtry = c(2,5,8,10,13,16,18,21,24,26),
                                            ntree = 500,
                                            nodesize = 1),
                            cvFolds = 3,
                            cvMetric = "Accuracy",
                            verbose = TRUE,
                            model = TRUE,
                            importances = TRUE,
                            impMeasure = "MDA",
                            performances = TRUE)

stilianoudakis/preciseTAD documentation built on Sept. 23, 2021, 9:36 p.m.

stilianoudakis/preciseTAD index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stilianoudakis/preciseTAD
preciseTAD: A machine learning framework for precise TAD boundary prediction

TADrandomForest: A wrapper function passed to 'caret::train' to apply a random...
In stilianoudakis/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

Description

Usage

Arguments

Value

Examples

Related to TADrandomForest in stilianoudakis/preciseTAD...

R Package Documentation

Browse R Packages

We want your feedback!

stilianoudakis/preciseTAD preciseTAD: A machine learning framework for precise TAD boundary prediction

TADrandomForest: A wrapper function passed to 'caret::train' to apply a random... In stilianoudakis/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

Description

Usage

Arguments

Value

Examples

Related to TADrandomForest in stilianoudakis/preciseTAD...

R Package Documentation

Browse R Packages

We want your feedback!

stilianoudakis/preciseTAD
preciseTAD: A machine learning framework for precise TAD boundary prediction

TADrandomForest: A wrapper function passed to 'caret::train' to apply a random...
In stilianoudakis/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction