\tableofcontents
\vspace{.1in}
To install this package, start R (version > "4.1") and enter:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("POWSC")
We use one scRNA-seq template data about ES-MEF cell types.
suppressMessages(library(POWSC)) data("es_mef_sce") sce = es_mef_sce[, colData(es_mef_sce)$cellTypes == "fibro"] set.seed(12) rix = sample(1:nrow(sce), 500) sce = sce[rix, ] est_Paras = Est2Phase(sce) sim_size = c(100, 200) # A numeric vector pow_rslt = runPOWSC(sim_size = sim_size, est_Paras = est_Paras,per_DE=0.05, DE_Method = "MAST", Cell_Type = "PW")
Determining the sample size for adequate power to detect statistical significance is a crucial step at the design stage for high-throughput experiments. Even though a number of methods and tools are available for sample size calculation for microarray and RNA-seq under the context of differential expression, this topic in the field of single-cell RNA sequencing is understudied. Moreover, the unique data characteristics present in scRNA-seq including sparsity and heterogeneity gain the challenge.
POWSC
is an R package designed for power assessment and sample size estimation in scRNA-seq experiment. It contains three main functionalities:
(1). Parameter Estimation: adopted and modified from the core of SC2P.
(2). Data Simulation: consider two cases: paired-wise comparison and multiple comparisons.
(3). Power Evaluation: provide both stratified (detailed) and marginal powers
In the context of differential expression (DE) analysis, scientists are usually interested in two different scenarios: (1) within cell type: comparing the same cell types across biological conditions such as case vs. control, which reveals the expression change of a particular cell type under different contexts. (2) between cell types: comparing different cell types under the same condition, which identifies biomarkers to distinguish cell types. In either case, the experiment starts from a number of cells randomly picked from a tissue sample consisting of a mixture of different cell types. The only factor one can control is the total number of cells.
In the first scenario, the numbers of cells for a particular cell type in different biological conditions are often similar, barring significant changes in cell composition. It uses one cell type as the benchmark data and perturbs the genes with mixture proportion ($\pi$) and log fold changes ($lfc$) as the DE genes according two forms.
# Users can customize how many cells they need as to change the number of n. simData = Simulate2SCE(n=100, estParas1 = est_Paras, estParas2 = est_Paras) de = runDE(simData$sce, DE_Method = "MAST") estPower1 = Power_Disc(de, simData = simData) estPower2 = Power_Cont(de, simData = simData)
In the second scenario, the numbers of cells for distinct cell types can be very different, so the power for DE highly depends on the mixing proportions. Here, we use 1000*57 expression matrix sampled from the patient ID AB_S7 from the GSE67835 data. The original data GSE67835 can be download here: https://www.dropbox.com/s/55zdktqfqiwfs3l/GSE67835.RData?dl=0.
data("GSE67835_AB_S7_sce") sim_size = 200 # use a large sample is preferrable cell_per = c(0.2, 0.3, 0.5) col = colData(sce) exprs = assays(sce)$counts (tb = table(colData(sce)$Patients, colData(sce)$cellTypes)) # use AB_S7 patient as example and take three cell types: astrocytes hybrid and neurons estParas_set = NULL celltypes = c("oligodendrocytes", "hybrid", "neurons") for (cp in celltypes){ print(cp) ix = intersect(grep(cp, col$cellTypes), grep("AB_S7", col$Patients)) tmp_mat = exprs[, ix] tmp_paras = Est2Phase(tmp_mat) estParas_set[[cp]] = tmp_paras } ######### ######### Simulation part ######### sim = SimulateMultiSCEs(n = sim_size, estParas_set = estParas_set, multiProb = cell_per) ######### ######### DE analysis part ######### DE_rslt = NULL for (comp in names(sim)){ tmp = runDE(sim[[comp]]$sce, DE_Method = "MAST") DE_rslt[[comp]] = tmp } ######### ######### Summarize the power result ######### pow_rslt = pow1 = pow2 = pow1_marg = pow2_marg = NULL TD = CD = NULL for (comp in names(sim)){ tmp1 = Power_Disc(DE_rslt[[comp]], sim[[comp]]) tmp2 = Power_Cont(DE_rslt[[comp]], sim[[comp]]) TD = c(TD, tmp2$TD); CD = c(CD, tmp2$CD) pow1_marg = c(pow1_marg, tmp1$power.marginal) pow2_marg = c(pow2_marg, tmp2$power.marginal) pow_rslt[[comp]] = list(pow1 = tmp1, pow2 = tmp2) pow1 = rbind(pow1, tmp1$power) pow2 = rbind(pow2, tmp2$power) } ######### ######### Demonstrate the result by heatmap ######### library(RColorBrewer); library(pheatmap) breaksList = seq(0, 1, by = 0.01) colors = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(length(breaksList)) dimnames(pow1) = list(names(sim), names(tmp1$CD)) dimnames(pow2) = list(names(sim), names(tmp2$CD)) pheatmap(pow2, display_numbers = TRUE, color=colors, show_rownames = TRUE, cellwidth = 30, cellheight = 40, legend = TRUE, border_color = "grey96", na_col = "grey", cluster_row = FALSE, cluster_cols = FALSE, breaks = seq(0, 1, 0.01), main = "")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.