Description Usage Arguments Details Value References See Also Examples
Simulate several future RNA-seq data under various experimental settings (sequencing depth, read length, insert sizes), estimate isoform expression and assess the MAE incurred in the estimation process. The function is a wrapper combining functions simReads and calcExp.
1 |
nsim |
Number of RNA-seq datasets to generate (often as little as
|
islandid |
When specified this argument indicates to run the
simulations only for gene islands with identifiers in
|
nreads |
Vector indicating the target number of read pairs for each
experimental setting. The actual number of reads differs from
|
readLength |
Vector indicating the read length in each experimental setting |
fragLength |
Vector indicating the mean insert size in each experimental setting |
burnin |
Number of MCMC burn-in samples (passed on to |
pc |
Observed path counts in pilot data. When not specified, these
are simulated from |
distr |
Estimated read start and insert size distributions in pilot data |
readLength.pilot |
Read length in pilot data |
eset.pilot |
ExpressionSet with pilot data expression in
log2-RPKM, used to simulate |
usePilot |
By default |
retTxsError |
If |
genomeDB |
|
mc.cores |
Number of cores to use in the expression estimation step, passed on to |
mc.cores.int |
Number of cores to simulate RNA-seq datasets in parallel |
verbose |
Set |
writeBam |
Set to |
bamFile |
Name of the .bam file |
simMAE
simulates nsim
datasets under each experimental
setting defined by nreads
, readLength
,
fragLength
. For each dataset the following steps are performed:
1. The number of reads is nreads * readYield * pmapped, where readYield= runif(1,0.8,1.2) accounts for deviations in read yield and pmapped= runif(1,0.6,0.9)*pmappable is the proportion of mapped reads (60%-90% of the mappable reads according to the piecewise-linear power law of Li et al (2014))
2. True expression levels pi
are generated from their posterior
distribution given the pilot data.
3. Conditional on pi
, RNA-seq data are generated and expression
estimates pihat
are obtained using calcExp
4. The mean absolute estimation error sum(abs(pihat-pi))
across
all isoforms is computed
Ideally simMAE
should use pilot data from a relevant related experiment
to simulate what future data may look like for the current experiment of
interest.
The recommened way to do this is to download a .bam file from such a
related experiment and processing it in casper with function
wrapKnown
, as then both gene and isoform expression can be
estimated accurately.
The object output by wrapKnown
is a list with elements
named 'pc'
, 'distr'
which can be given as input to
simMAE
.
As an alternative to specifying pc
,
simMAE
allows setting eset.pilot
as
pilot data. Gene and isoform expression are then simulated as follows:
1. The number of reads per gene is generated from a Multinomial
distribution with success probabilities proportional to
2^exprs{eset.pilot}
.
2. Relative isoform expression within each gene are generated from a symmetric Dirichlet distribution with parameter 1/Ig, where Ig is the number of isoforms in gene g.
We emphasize that relative isoform expressions are not trained from the
pilot data, and that while the distribution of gene expression levels
resembles that in eset.pilot
, no attempt is made to match gene
identifiers and hence the results for individual genes should not be
trusted
(hence this option is only available when retTxsError==FALSE
.
If retTxsError==TRUE
, simMAE
returns posterior expected MAE
for each individual isoform.
Else the output is a data.frame
with overall MAE across all isoforms
Stephan-Otto Attolini C., Pena V., Rossell D. Bayesian designs for personalized alternative splicing RNA-seq studies (2014)
Li, W. and Freudenberg, J. and Miramontes, P. Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome. BMC Bioinformatics, 15, 2 (2014)
wrapKnown,simReads,calcExp
1 | ## maybe str(simMAE) ; plot(simMAE) ...
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.