wrapDenovo: Run all necessary steps to get expression estimates from...
In casper: Characterization of Alternative Splicing based on Paired-End Reads

Description Usage Arguments Details Value Author(s) References See Also Examples

Function to analyze bam files to generate an ExpressionSet with expression estimates for all samples, read start and fragment length distributions, path counts and optinally processed reads.

wrapDenovo(bamFile, output_wrapKnown, knownGenomeDB, targetGenomeDB, readLength, 
  rpkm=TRUE, keep.multihits=TRUE, searchMethod="submodels", 
  exactMarginal=TRUE,  integrateMethod = "plugin", maxExons=40, 
  islandid, chroms=NULL, keep.pbam=FALSE,  keepPbamInMemory=FALSE,
  niter=10^3, priorq=3, priorqGeneExpr=2,
  mc.cores.int=1, mc.cores=1, verbose=TRUE, seed=1)

`bamFile`	Names of bam files with the sample to analyze. These must sorted and indexed, and the index must be in the same directory.
`output_wrapKnown`	Optional argument containing the output of an earlier call to `wrapKnown`. If provided, path counts, read start and insert size distributions are loaded from this output rather than being re-computed. Better leave this argument missing unless you know what you're doing.
`knownGenomeDB`	`annotatedGenome` object with known isoforms, e.g. from UCSC or GENCODE annotations. Used to set the prior probability that any given isoform is expressed. See help(calcDenovo) for details.
`targetGenomeDB`	`annotatedGenome` object with isoforms we wish to quantify. By default these are the same as in `knownGenomeDB`, but more typically `targetGenomeDB` is imported from a .gtf file produced by some isoform prediction software.
`readLength`	Read length in bp, e.g. in a paired-end experiment where 75bp are sequenced on each end one would set `readLength=75`.
`rpkm`	Set to `TRUE` to return reads per kilobase per million (RPKM), `FALSE` for relative expression levels. Important, relative expression adds up to 1 within gene island, NOT within gene. To get relative expressions within gene run `relexprByGene` afterwards. See help(wrapKnown).
`keep.multihits`	Set to `FALSE` to discard reads aligned to multiple positions.
`searchMethod`	Method used to perform the model search. `"allmodels"` enumerates all possible models (warning: this is not feasible for genes with >5 exons). `"rwmcmc"` uses a random-walk MCMC scheme to focus on models with high posterior probability. `"submodels"` considers that some isoforms in `targetGenomeDB` may not be expressed, but does not search for new variants. `"auto"` uses `"allmodels"` for genes with up to 5 exons and `"rwmcmc"` for longer genes. See help("calcDenovo").
`exactMarginal`	Set to `FALSE` to estimate posterior model probabilities as the proportion of MCMC visits. Set to `TRUE` to use the integrated likelihoods (default). See details.
`integrateMethod`	Method to compute integrated likelihoods. The default (`'plugin'`) evaluates likelihood*prior at the posterior mode and is the faster option. Set `'Laplace'` for Laplace approximations and `'IS'` for Importance Sampling. The latter increases computation cost very substantially.
`maxExons`	Prior probabilities of isoform expression are estimated for genes with 1 up to `maxExons` exons separately, for genes with more than `maxExons` exons a combined estimate is used. See `help("modelPrior")`
`islandid`	Names of the gene island to be analyzed. If missing all gene islands are analyzed
`chroms`	Names of the chromosomes to be analyzed. If missing all chromosomes are analyzed.
`keep.pbam`	Set to `TRUE` to save processed bam object, as returned by `procBam`. This object can require substantial memory during execution and disk storage upon saving and is not needed for a default analysis.
`keepPbamInMemory`	Set to `TRUE` to keep processed bam objects in memory to speed up some computations.
`niter`	Number of MCMC iterations in the model search algorithm.
`priorq`	Parameter of the Dirichlet prior for the proportion of reads coming from each variant. We recommend `priorq=3` as this defines a non-local prior that penalizes falsely predicted isoforms.
`priorqGeneExpr`	Parameter of the Dirichlet prior distribution on overall gene expression. Defaults to 2 to ensure non-zero estimates.
`mc.cores`	Number of cores to use in expression estimation.
`mc.cores.int`	Number of cores to use when loading bam files. Be careful as this is a memory intensive step.
`verbose`	Set to `TRUE` to display progress information.
`seed`	Set seed of random number generator.

The function executes the functions procBam, getDistrs, pathCounts calcDenovo and denovoExpr and formats the output nicely. Running wrapDenovo is much more efficient in cpu speed and memory usage than running these functions separately.

When rpkm is false the function returns the estimated proportion of reads arising from each isoform within a gene island. See the details in help("wrapKnown") for more information on this.

denovoGenomeDB

annotatedGenome that contains the isoforms in targetGenomeDB plus any new isoforms predicted by casper

`exp`	Object of class `ExpressionSet` containing Bayesian model averaging expression estimates. See the fData for the posterior probability that each isoform is expressed.
`distr`	Object of class `readDistrs`
`pbam`	List of objects of class `procBam` with one element per chromosome

Miranda Stobbe, David Rossell

Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.

calcDenovo, wrapKnown, relexprByGene

## not run
## Known isoforms
##  library(TxDb.Hsapiens.UCSC.hg19.knownGene)
##  hg19DB <- procGenome(TxDb.Hsapiens.UCSC.hg19.knownGene), genome='hg19')

## gtf with known & de novo predictions
##  mygtf <- import('hg19_denovo.gtf')
##  hg19denovoDB <- procGenome(mygtf, genome='hg19')

## bamFile="/path_to_bam/sorted.bam"
##  ans <- wrapDenovo(bamFile=bamFile, targetGenomeDB=hg19denovoDB, knownGenomeDB=hg19DB, readLength=101)

## Estimated expression via BMA
##  head(exprs(ans[['exp']]))

## Posterior probability that each isoform is expressed
##  head(fData(ans[['exp']]))