Description Usage Arguments Details Value Author(s) References Examples
View source: R/simMultSamples.R
Simulate true expression levels and observed data (casper expression estimates) for future samples within each group.
These simulations serve as the basis for sample size calculation:
if one were to sequence nsamples
new RNA-seq samples, what data
would we expect to see? The simulation is posterior predictive,
i.e. based on the current available data x
.
1 2 | simMultSamples(nsim, nsamples, nreads, readLength, fragLength, x,
groups='group', distrs, genomeDB, model='LNNMV', verbose=TRUE, mc.cores=1)
|
nsim |
Number of simulations to obtain |
nsamples |
Vector indicating number of future samples per group,
e.g. |
nreads |
Desired number of paired-end reads per sample. The actual number of aligned reads for any given sample differs from this amount, see details. |
readLength |
Read length, i.e. in an experiment with paired reads
at 100bp each, |
fragLength |
Desired average insert size (size of RNA molecules
after fragmentation). If missing, insert sizes are as obtained from |
x |
|
groups |
Name of column in |
distrs |
Fragment start and length distributions. It can be either an object or a list of objects of class readDistrs. In the latter case, an element is chosen at random for each individual sample to consider uncertainty in these distributions. If not specified, it defaults to data(distrsGSE37704). |
genomeDB |
annotatedGenome object |
model |
Set to |
verbose |
Set to |
mc.cores |
Number of cores to use in function.
|
The posterior predictive simulations is based on four steps: (1) simulate true expression for each group (mean and SD), (2) simulate true expression for future samples, (3) simulate paired reads for each future sample, (4) estimate expression from the reads via Casper. Below are some more details.
1. Simulate true mean expression in each group and residual variance
for each gene. If model=='LNNMV' this is based on the log-normal
normal with modified variance model in package EBarrays
(Yuan & Kendziorski 2006), if model=='GaGa' this is based on the GaGa
model (Rossell, 2009).
adapted to take into account that the expression estimates in the
pilot data x
are noisy (which is why simMultSamples
requires the
SE / posterior SD associated to exprs(x)
).
The simulated values are returned in component "simTruth"
of
the simMultSamples
output.
2. Simulate true isoform expression for each of the future samples. These are independent Normal draws with mean and variance generated in step 1. True gene expression is derived from the isoform expressions.
3. Determine the number of reads to be simulated for each gene based on its true expression (generated in step 2) and a Multinomial sampling model. For each sample:
- The number of reads yielded by the experiment is Unif(.8*nreads,1.2*nreads) - A proportion of non-mappable reads is discarded using the power law in Li et al (2014) - Amongst remaining reads, we assume that a proportion Unif(0.6,0.9) were aligned (consistenly with reports from ENCODE project)
The final number of simulated reads is reported in component "simExpr"
of the simMultSamples
output.
4. Obtain expression estimates from the path counts produced in step 3
via calcExp
. These are reported in component "simExpr"
of the simMultSamples
output.
Object of class simulatedSamples
, which extends a list
of length nsim
. See the class documentation for some helpful
methods (e.g. coef, exprs, mergeBatches).
Each element is itself a list containing an individual simulation.
simTruth |
|
simExpr |
|
Victor Pena, David Rossell
Rossell D. (2009) GaGa: a Parsimonious and Flexible Model for Differential Expression Analysis. Annals of Applied Statistics, 3, 1035-1051.
Stephan-Otto Attolini C., Pena V., Rossell D. Bayesian designs for personalized alternative splicing RNA-seq studies (2015)
Yuan, M. and Kendziorski, C. (2006). A unified approach for simultaneous gene clustering and differential expression identification. Biometrics, 62, 1089-1098.
1 | #Run casperDesign() to see full manual with examples
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.