Description Usage Arguments Value Author(s) References See Also Examples
The ultimate goal of transcriptR
is to identify continuous regions
of transcription. However, in some areas of the genome it is not possible
to detect transcription, because of the presence of the low mappability
regions and (high copy number) repeats. Sequencing reads can not be uniquely
mapped to these positions, leading to the formation of gaps in otherwise
continuous coverage profiles and segmentation of transcribed regions into
multiple smaller fragments. The gap distance describes the maximum allowed
distance between adjacent fragments to be merged into one transcript. To
choose the optimal value for the gap distance, the detected transcripts
should largely be in agreement with available reference annotations.
To accomplish this, the function is build on the methodology proposed by
Hah et al. (Cell, 2011).
In brief, the two types of erros are defined:
dissected
error - the ratio of annotations that is segmented
into two or more fragments.
merged
error - the ratio of non-overlapping annotations that
merged by mistake in the experimental data.
There is an interdependence between two types of errors. Increasing the gap
distance decreases the dissected
error, by detecting fewer, but longer
transcripts, while the merged
error will increase as more detected
transcripts will span multiple annotations. The gap distance with the lowest
sum of two error types is chosen as the optimal value.
1 2 3 4 5 6 7 8 | estimateGapDistance(object, annot, coverage.cutoff, filter.annot = TRUE,
fpkm.quantile = 0.25, gap.dist.range = seq(from = 0, to = 10000, by =
100))
## S4 method for signature 'TranscriptionDataSet,GRanges'
estimateGapDistance(object, annot,
coverage.cutoff, filter.annot = TRUE, fpkm.quantile = 0.25,
gap.dist.range = seq(from = 0, to = 10000, by = 100))
|
object |
A |
annot |
|
coverage.cutoff |
|
filter.annot |
|
fpkm.quantile |
|
gap.dist.range |
A numeric vector specifying a range of gap distances to test. By default, the range is from 0 to 10000 with a step of 100. |
The slot gapDistanceTest
of the provided
TranscriptionDataSet
object will be updated by the
data.frame
, containing estimated error rates for each
tested gap distance (see getTestedGapDistances
, for the
details).
Armen R. Karapetyan
Hah N, Danko CG, Core L, Waterfall JJ, Siepel A, Lis JT, Kraus WL. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell. 2011.
constructTDS
plotErrorRate
getTestedGapDistances
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ### Load TranscriptionDataSet object
data(tds)
### Load reference annotations (knownGene from UCSC)
data(annot)
### Estimate gap distance minimazing error rate
### Define the range of gap distances to test
gdr <- seq(from = 0, to = 10000, by = 1000)
estimateGapDistance(object = tds, annot = annot, coverage.cutoff = 5,
filter.annot = FALSE, gap.dist.range = gdr)
### View estimated gap distance
tds
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.