The modelling of survival data may be done using ClassifyR, not only categorical data. Currently, feature selection based on Cox proportional hazards ranking and models built on Cox proportional hazards or random survival forests are available.
To illustrate, clinical variables of the METABRIC breast cancer cohort will be used to predict risk scores for patients.
library(ClassifyR) set.seed(8400)
data(METABRICclinical) # Contains measurements and follow-up time and recurrence status. head(clinical)
Cross-validation is very similar to the classification scenario, except that a Surv
object or two column names indicating the time and event (in that order) are to be specified.
survCrossValidated <- crossValidate(clinical, c("timeRFS", "eventRFS")) survCrossValidated
By default, Cox proportional hazards has been used for feature selection as well as modelling.
The distribution of C-index values can be plotted.
performancePlot(survCrossValidated)
The typical C-index is about 0.65, often seen in genomics analysis.
Now, do no explicit feature selection and use a random survival forest. This will take substantially longer than Cox proportional hazards because parameter tuning of the fraction of variables to consider at each split and the number of trees to build are optimised over a grid of default values (see Parameter Tuning Presets article for details).
survForestCrossValidated <- crossValidate(clinical, c("timeRFS", "eventRFS"), selectionMethod = "none", classifier = "randomSurvivalForest", nCores = 20) resultsList <- list(survCrossValidated, survForestCrossValidated) performancePlot(resultsList)
The random survival forest performs better than random chance but not better than Cox proportional hazards.
Exercise: Plot the per-sample C-index using samplesMetricMap(resultsList)
to identify patients who are predicted well and those who are not.
Finally, note that extreme gradient boosting can also fit survival models. classifier = "XGB"
will fit such a model. Remember that classifier
can also be a vector of strings, so all three models can be fitted in one command.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.