designSampleSizeClassificationPlots: Visualization for sample size calculation in classification
In Vitek-Lab/MSstatsSampleSize: Simulation tool for optimal design of high-dimensional MS-based proteomics experiment

Description Usage Arguments Details Value Author(s) Examples

View source: R/designSampleSizeClassificationPlots.R

To illustrate the mean classification accuracy and protein importance under different sample sizes through predictive accuracy plot and protein importance plot.

designSampleSizeClassificationPlots(
  data,
  optimal_threshold = 0.001,
  num_important_proteins_show = 10,
  protein_importance_plot = TRUE,
  predictive_accuracy_plot = TRUE,
  save.pdf = FALSE,
  ...
)

`data`	A list of outputs from function `designSampleSizeClassification`. Each element represents the results under a specific sample size. The input should include at least two simulation results with different sample sizes.
`optimal_threshold`	The maximal cutoff for deciding the optimal sample size. Default is 0.0001. Large cutoff can lead to smaller optimal sample size whereas small cutoff produces large optimal sample size.
`num_important_proteins_show`	The number of proteins to show in protein importance plot.
`protein_importance_plot`	TRUE(default) draws protein importance plot.
`predictive_accuracy_plot`	TRUE(default) draws predictive accuracy plot.
`save.pdf`	A logical input, determines to save the plots as a pdf or not, the pdf plot is saved in the current working directory, name of the created file is displayed on the console and logged for easier access
`...`	Arguements that can be passed to ggplot2::theme functions to alter the visuals

This function visualizes for sample size calculation in classification. Mean predictive accuracy and mean protein importance under each sample size is from the input ‘data’, which is the output from function designSampleSizeClassification.

To illustrate the mean predictive accuracy and protein importance under different sample sizes, it generates two types of plots in pdf files as output: (1) The predictive accuracy plot, The X-axis represents different sample sizes and y-axis represents the mean predictive accuracy. The reported sample size per condition can be used to design future experiment

(2) The protein importance plot includes multiple subplots. The number of subplots is equal to ‘list_samples_per_group’. Each subplot shows the top 'num_important_proteins_show' most important proteins under each sample size. The Y-axis of each subplot is the protein name and X-axis is the mean protein importance under the sample size.

predictive accuracy plot is the mean predictive accuracy under different sample sizes. The X-axis represents different sample sizes and y-axis represents the mean predictive accuracy.

protein importance plot includes multiple subplots. The number of subplots is equal to 'list_samples_per_group'. Each subplot shows the top 'num_important_proteins_show' most important proteins under each sample size. The Y-axis of each subplot is the protein name and X-axis is the mean protein importance under the sample size.

a numeric value which is the estimated optimal sample size per group for the input dataset for classification problem.

Ting Huang, Meena Choi, Sumedh Sankhe, Olga Vitek.

data(OV_SRM_train)
data(OV_SRM_train_annotation)

# simulate different sample sizes
# 1) 10 biological replicats per group
# 2) 25 biological replicats per group
# 3) 50 biological replicats per group
# 4) 100 biological replicats per group
list_samples_per_group <- c(10, 25, 50, 100)

# save the simulation results under each sample size
multiple_sample_sizes <- list()
for(i in seq_along(list_samples_per_group)){
    # run simulation for each sample size
    simulated_datasets <- simulateDataset(data = OV_SRM_train,
                                          annotation = OV_SRM_train_annotation,
                                          log2Trans = FALSE,
                                          num_simulations = 10, # simulate 10 times
                                          samples_per_group = list_samples_per_group[i],
                                          protein_rank = "mean",
                                          protein_select = "high",
                                          protein_quantile_cutoff = 0.0,
                                          expected_FC = "data",
                                          list_diff_proteins =  NULL,
                                          simulate_valid = FALSE,
                                          valid_samples_per_group = 50)

    # run classification performance estimation for each sample size
    res <- designSampleSizeClassification(simulations = simulated_datasets,
                                          parallel = TRUE)

    # save results
    multiple_sample_sizes[[i]] <- res
}

## make the plots and save them to disk
designSampleSizeClassificationPlots(data = multiple_sample_sizes, save.pdf = TRUE)

## make accuracy plot print in the Plots panes
designSampleSizeClassificationPlots(data = multiple_sample_sizes, predictive_accuracy_plot = TRUE)

## make accuracy plot print in the Plots panes
designSampleSizeClassificationPlots(data = multiple_sample_sizes, =predictive_accuracy_plot = T)

Vitek-Lab/MSstatsSampleSize documentation built on Aug. 28, 2020, 10:39 a.m.

Vitek-Lab/MSstatsSampleSize index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Vitek-Lab/MSstatsSampleSize
Simulation tool for optimal design of high-dimensional MS-based proteomics experiment

designSampleSizeClassificationPlots: Visualization for sample size calculation in classification
In Vitek-Lab/MSstatsSampleSize: Simulation tool for optimal design of high-dimensional MS-based proteomics experiment

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to designSampleSizeClassificationPlots in Vitek-Lab/MSstatsSampleSize...

R Package Documentation

Browse R Packages

We want your feedback!

Vitek-Lab/MSstatsSampleSize Simulation tool for optimal design of high-dimensional MS-based proteomics experiment

designSampleSizeClassificationPlots: Visualization for sample size calculation in classification In Vitek-Lab/MSstatsSampleSize: Simulation tool for optimal design of high-dimensional MS-based proteomics experiment

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to designSampleSizeClassificationPlots in Vitek-Lab/MSstatsSampleSize...

R Package Documentation

Browse R Packages

We want your feedback!

Vitek-Lab/MSstatsSampleSize
Simulation tool for optimal design of high-dimensional MS-based proteomics experiment

designSampleSizeClassificationPlots: Visualization for sample size calculation in classification
In Vitek-Lab/MSstatsSampleSize: Simulation tool for optimal design of high-dimensional MS-based proteomics experiment