DENR quantifies nascent transcript abundance based on reference
annotations. Those annotations can be from anywhere as long as they can be placed
in a GRanges
object. Here we obtain our transcript annotations from a Txdb
object.
library(DENR) txdb_path_ds <- system.file("extdata", "test_double_strand.txdb", package = "DENR") txdb_ds <- AnnotationDbi::loadDb(file = txdb_path_ds) gr_ds <- GenomicFeatures::transcripts(txdb_ds, c("tx_name", "gene_id")) gr_ds$gene_id <- as.character(gr_ds$gene_id)
print(gr_ds)
Txdb
objects provide a convienent way to obtain such annotations from a variety
of species and assemblies through biomaRt
. We also implemented some convenience
functions for retrieving and locally caching Txdb
objects from the ensembl
biomart in a companion package nascentRNASim
via function download_txdb()
.
Once we have a list of transcripts we can contruct a transcript_quantification
object which will hold all data for further analysis. In order to do this we must
specify a bin size which will define the resolution of the transcript models.
Bin sizes that are too small are computationally expensive and may provide a false
sense of precision while bin sizes that are too large will result in a loss of
ability to differentiate signal from distinct transcripts. As a reasonable
midground we select a bin size of 250bp. We also must specify the metadata column
corresponding to a unique transcript id. Another column for a unique gene id may
be specified but is not required.
bsize <- 250 tq <- transcript_quantifier(gr_ds, bin_size = bsize, transcript_name_column = "tx_name", gene_name_column = "gene_id", mask_start_bins = c(1, 1), mask_end_bins = c(1, 1))
To add a specific data set to the transcript_quantifier
object we use the
add_data()
function using the mean
summary operation.
# The paths to relevant bigwig files bwp <- system.file("extdata", "test_double_strand_plus.bw", package = "DENR") bwm <- system.file("extdata", "test_double_strand_minus.bw", package = "DENR") # Add the data tq <- add_data(tq = tq, bigwig_plus = bwp, bigwig_minus = bwm)
All we need to do now to estimate transcript abundances is call the fit()
function.
tq <- fit(tq)
To view the results we can all the transcript_abundance()
or gene_abundance()
function.
ta <- transcript_abundance(tq) head(ta)
We can also create plots for specific transcripts or genomic regions
plot_model(tq, chrom = 1, start = 1, end= 11000, ymax_bw = 300, ymax_abundance = 250)
Note that if USCS chromosome names are not used (e.g., Ensembl style, 1 instead of chr1),
user will need to first library(Gviz)
then set options(ucscChromosomeNames=FALSE)
.
Because of variable polymerase elongation rates along the transcript body, polymerase density systematically varies along the length of the transcript. To accomadate this DENR has the ability to fit a "shape profile" that attempts to capture this empirically. This empirical shape profile is then used instead of the uniform one during the abundance estimation step. To estimate a shape profile use the \code{transcript_shape_profile} as below (note the alteration of the \code{min_transcript_length} and the head and tail lengths here just for the purposes of demonstration with a synthetic data set. Generally using transcripts that short is not reccomended).
tsp <- transcript_shape_profile(transcripts = gr_ds, bigwig_plus = bwp, bigwig_minus = bwm, bin_size = bsize, linear_head_length = 500, linear_tail_length = 500, min_transcript_length = 5e3)
The empirical profile can be viewed with the \code{view()} function. This example looks a fairly odd as we are running on synthetic data and allowing really short transcripts.
view(tsp)
The shape profile is then incorporated into the \code{transcript_quantifier} object and fitting proceeds as normal.
tq_shape <- apply_shape_profile(tq, tsp) tq_shape <- fit(tq_shape) head(transcript_abundance(tq_shape))
Nascent RNA data tends to be fairly noisy and spill over annotated transcript boundries. For this reason, pre-filtering the set transcripts to quantify abundance for can be quite useful. Specific transcripts can be specified as inactive during the fitting step using the \code{inactive_transcripts} option. You may create the list of these inactive transcripts in whatever manner you choose, e.g., based on emprical evidences from GRO-cap, PRO-cap, CAGE, RNA-seq, etc.
# Inactive transcripts selected manually inactive_tss <- c("t1.2", "t2.1") tq <- fit(tq = tq, inactive_transcripts = inactive_tss)
However, if you have a tensorflow backend installed you can use a pre-trained ML classifier to predict inactive TSS and feed that into the fitting step instead. For instructions on how to install tensorflow see here.
itx <- predict_inactive_transcripts(tq, bwp, bwm) print(itx)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.