crissCrossValidate: A function to perform pairwise cross validation

View source: R/crissCrossValidate.R

crissCrossValidateR Documentation

A function to perform pairwise cross validation

Description

This function has been designed to perform cross-validation and model prediction on datasets in a pairwise manner.

Usage

crissCrossValidate(
  measurements,
  outcomes,
  nFeatures = 20,
  selectionMethod = "auto",
  selectionOptimisation = "Resubstitution",
  trainType = c("modelTrain", "modelTest"),
  performanceType = "auto",
  doRandomFeatures = FALSE,
  runTOP = FALSE,
  classifier = "auto",
  nFolds = 5,
  nRepeats = 20,
  nCores = 1,
  verbose = 0
)

Arguments

measurements

A list of either DataFrame, data.frame or matrix class measurements.

outcomes

A list of vectors that respectively correspond to outcomes of the samples in measurements list. / Factors should be coded such that the control class is the first level.

nFeatures

The number of features to be used for modelling.

selectionMethod

Default: "auto". A character keyword of the feature algorithm to be used. If "auto", t-test (two categories) / F-test (three or more categories) ranking and top nFeatures optimisation is done. Otherwise, the ranking method is per-feature Cox proportional hazards p-value.

selectionOptimisation

A character of "Resubstitution", "Nested CV" or "none" specifying the approach used to optimise nFeatures.

trainType

Default: "modelTrain". A keyword specifying whether a fully trained model is used to make predictions on the test set or if only the feature identifiers are chosen using the training data set and a number of training-predictions are made by cross-validation in the test set.

performanceType

Default: "auto". If "auto", then balanced accuracy for classification or C-index for survival. Otherwise, any one of the options described in calcPerformance may otherwise be specified.

doRandomFeatures

Default: FALSE. Whether to perform random feature selection to establish a baseline performance. Either FALSE or TRUE are permitted values.

runTOP

Default: FALSE. If TRUE, perform the Transferable Omics Prediction (TOP) procedure in a leave-one-dataset-out manner.

classifier

Default: "auto". A character keyword of the modelling algorithm to be used. If "auto", then a random forest is used for a classification task or Cox proportional hazards model for a survival task.

nFolds

A numeric specifying the number of folds to use for cross-validation.

nRepeats

A numeric specifying the number of repeats or permutations to use for cross-validation.

nCores

A numeric specifying the number of cores used if the user wants to use parallelisation.

verbose

Default: 0. A number between 0 and 3 for the amount of progress messages to give. A higher number will produce more messages.

Value

A list with elements "real" for the matrix of pairwise performance metrics using real feature selection, "random" if doRandomFeatures is TRUE for metrics of random selection, "top" if runTOP is TRUE, and "params" for a list of parameters used.

Author(s)

Harry Robertson


DarioS/ClassifyR documentation built on Feb. 3, 2025, 11:36 a.m.