Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/forwsimDiffExpr.r
Forward simulation allows to evaluate the expected utility for sequential designs. Here the utility is the expected number of true discoveries minus a sampling cost. The routine simulates future data either from the prior predictive or using a set of pilot data and a GaGa or normal-normal model fit. At each future time point, it computes a summary statistic that will be used to determine when to stop the experiment.
1 2 | forwsimDiffExpr(fit, x, groups, ngenes, maxBatch, batchSize, fdrmax = 0.05, genelimit, v0thre = 1, B = 100,
Bsummary = 100, trace = TRUE, randomSeed, mc.cores=1)
|
fit |
Either GaGa or MiGaGa fit (object of type |
x |
|
groups |
If |
ngenes |
Number of genes to simulate data for. If |
maxBatch |
Maximum number of batches, i.e. the routine simulates
|
batchSize |
Batch size, i.e. number of observations per group to
simulate at each time point. Defaults to |
fdrmax |
Upper bound on FDR. |
genelimit |
Only the |
v0thre |
Only genes with posterior probability of being equally
expressed < |
B |
Number of forward simulations. |
Bsummary |
Number of simulations for estimating the summary statistic. |
trace |
For |
randomSeed |
Integer value used to set random number generator
seed. Defaults to |
mc.cores |
If |
To improve computational speed hyper-parameters are not re-estimated as new data is simulated.
A data.frame
with the following columns:
simid |
Simulation number. |
j |
Time (sample size). |
u |
Expected number of true positives if we were to stop experimentation at this time. |
fdr |
Expected FDR if we were to stop experimentation at this time. |
fnr |
Expected FNR if we were to stop experimentation at this time. |
power |
Expected power (as estimated by E(TP)/E(positives)) if we were to stop experimentation at this time. |
summary |
Summary statistic: increase in expected true positives if we were to obtain one more data batch. |
David Rossell.
Rossell D., Mueller P. Sequential sample sizes for high-throughput hypothesis testing experiments. http://sites.google.com/site/rosselldavid/home.
Rossell D. GaGa: a simple and flexible hierarchical model for microarray data analysis. Annals of Applied Statistics, 2009, 3, 1035-1051.
plotForwSim
to plot the simulated trajectories,
fitGG
for fitting a GaGa model,
fitNN
for fitting a normal-normal model,
seqBoundariesGrid
for finding the optimal design based
on the forwards simulation output.
powfindgenes
for fixed sample size calculations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | #Simulate data and fit GaGa model
set.seed(1)
x <- simGG(n=20,m=2,p.de=.5,a0=3,nu=.5,balpha=.5,nualpha=25)
gg1 <- fitGG(x,groups=1:2,method='EM')
gg1 <- parest(gg1,x=x,groups=1:2)
#Run forward simulation
fs1 <- forwsimDiffExpr(gg1, x=x, groups=1:2,
maxBatch=2,batchSize=1,fdrmax=0.05, B=100, Bsummary=100, randomSeed=1)
#Expected number of true positives for each sample size
tapply(fs1$u,fs1$time,'mean')
#Expected utility for each sample size
samplingCost <- 0.01
tapply(fs1$u,fs1$time,'mean') - samplingCost*(0:2)
#Optimal sequential design
b0seq <- seq(0,20,length=200); b1seq <- seq(0,40,length=200)
bopt <-seqBoundariesGrid(b0=b0seq,b1=b1seq,forwsim=fs1,samplingCost=samplingCost,powmin=0)
bopt <- bopt$opt
plot(fs1$time,fs1$summary,xlab='Additional batches',ylab='E(newly discovered DE genes)')
abline(bopt['b0'],bopt['b1'])
text(.2,bopt['b0'],'Continue',pos=3)
text(.2,bopt['b0'],'Stop',pos=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.