Description Usage Arguments Details Value Author(s) Examples
View source: R/designSampleSizeHypothesisTestingPlot.R
Calculate sample size for future experiments based on intensity-based linear model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
data |
Protein abundance data matrix. Rows are proteins and columns are biological replicates (samples). |
annotation |
Group information for samples in data. ‘Run’ for MS run, ‘BioReplicate’ for biological subject ID and ‘Condition’ for group information are required. ‘Run’ information should be the same with the column of ‘data’. Multiple ‘Run’ may come from same ‘BioReplicate’. |
log2Trans |
Default is FALSE. If TRUE, the input ‘data’ is log-transformed with base 2. |
desired_FC |
the range of a desired fold change. The first option (Default) is "data", indicating the range of the desired fold change is directly estimated from the input ‘data’, which are the minimal fold change and the maximal fold change in the input ‘data’. The second option is a vector which includes the lower and upper values of the desired fold change (For example, c(1.25,1.75)). |
protein_rank |
The standard to rank the proteins in the input ‘data’. It can be 1) "mean" of protein abundances over all the samples or 2) "sd" (standard deviation) of protein abundances over all the samples or 3) the "combined" of mean abundance and standard deviation. The proteins in the input ‘data’ are ranked based on ‘protein_rank’ and the user can select a subset of proteins for hypothesis testing and sample size calculation. |
protein_select |
select proteins with "low" or "high" mean abundance or standard deviation (variance) or their combination for hypothesis testing and sample size calculation. The variance (and the range of desired fold change if desiredFC = "data") will be estimated from the selected proteins. If ‘protein_order = "mean"’ or protein_order = "sd"', ‘protein_select’ should be "low" or "high". Default is "high", indicating high abundance or standard deviation proteins are selected. If ‘protein_order = "combined"’, ‘protein_select’ has two elements. The first element corrresponds to the mean abundance. The second element corrresponds to the standard deviation (variance). Default is c("high", "low") (select proteins with high abundance and low variance). |
protein_quantile_cutoff |
Quantile cutoff(s) for selecting protiens for hypothesis testing and sample size calculation. For example, when ‘protein_rank="mean"’, and ‘protein_select="high"’, ‘protein_quantile_cutoff=0.1’ Proteins are ranked based on their mean abundance across all the samples. Then, the top 10 Default is 0.0, which means that all the proteins are used. If ‘protein_rank = "combined"’, ‘protein_quantile_cutoff’' has two cutoffs. The first element corrresponds to the cutoff for mean abundance. The second element corrresponds to the cutoff for the standard deviation (variance). Default is c(0.0, 1.0), which means that all the proteins will be used. |
FDR |
a pre-specified false discovery ratio (FDR) to control the overall false positive. Default is 0.05 |
power |
a pre-specified statistical power which defined as the probability of detecting a true fold change. You should input the average of power you expect. Default is 0.9 |
height |
Height of the saved pdf file. Default is 5. |
width |
Width of the saved pdf file. Default is 5. |
address |
The name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of ‘HypothesisTestingSampleSizePlot.pdf’. The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window. |
The function fits intensity-based linear model on the input ‘data’. Then it uses the fitted models and the fold changes estimated from the models to calculate sample size for hypothesis testing through ‘designSampleSize’ function from MSstats package. It outputs the minimal number of biological replciates per condition to acquire the expected FDR and power under different fold changes.
sample size plot for hypothesis testing : the plot for the minimal number of biological replciates per condition to acquire the expected FDR and power under different fold changes.
data frame with columns desiredFC, numSample, FDR, power and CV
Ting Huang, Meena Choi, Olga Vitek
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | data(OV_SRM_train)
data(OV_SRM_train_annotation)
# sample size plot for hypothesis testing
HT_res <- designSampleSizeHypothesisTestingPlot(data = OV_SRM_train,
annotation= OV_SRM_train_annotation,
log2Trans = FALSE,
desired_FC = "data",
protein_rank = "mean",
protein_select = "high",
protein_quantile_cutoff = 0.0,
FDR=0.05,
power=0.9)
# data frame with columns desiredFC, numSample, FDR, power and CV
head(HT_res)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.