BiocStyle::markdown(css.files = c('custom.css'))
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("EpiTxDb","EpiTxDb.Hs.hg38"))
The epitranscriptome includes all post-transcriptional modifications of the RNA and describes and additional layer of information encoded on RNA. Like the term epigenome it is not about a change in nucleotide sequences, but the addition of functional elements through modifications.
With the development of high throughput detection strategies for specific RNA modifications, such as miCLIP and Pseudo-Seq amongst other, a large number of modified positions have been identified and were summarized via the RMBase project [@Xuan.2017;@Sun.2015] project.
To make these information avaialble within the Bioconductor universe EpiTxDb
was developed, which facilitates the storage of epitranscriptomic information.
More specifically, it can keep track of modification identity, position, the
enzyme for introducing it on the RNA, a specifier which determines the position
on the RNA to be modified and the literature references each modification is
associated with.
library(EpiTxDb) library(EpiTxDb.Hs.hg38)
The class EpiTxDb
is the class for storing the epitranscriptomic data. It
inherits the inner workings of AnnotationDb
class from the AnnotationDbi
package.
As an example for the vignette the snoRNAdb data [@Lestrade.2006]
from the EpiTxDb.Hs.hg38
package will be used. The data is stored in the
AnnotationHub
and is downloaded and cached upon the first request.
etdb <- EpiTxDb.Hs.hg38.snoRNAdb() etdb
As expected for an AnnotationDb
class the general accessors are available.
keytypes(etdb) columns(etdb) head(keys(etdb, "MODID")) select(etdb, keys = "1", columns = c("MODNAME","MODTYPE","MODSTART","MODSTRAND","SNNAME", "RXGENENAME","SPECTYPE","SPECGENENAME"), keytype = "MODID")
The columns with the prefix RX
or SPEC
reference the reaction enzyme and the
location specifier. This can be the same information, but for ribosomal
modifications from the snoRNAdb it is of course fibrillarin and a snoRNA.
In addition the following accessor for metadata are available as well.
species(etdb) organism(etdb) seqlevels(etdb)
The specialized accessors are modifications()
and modificationsBy()
.
modifications()
allows for filtering results, whereas modificationsBy()
returns all the modifications in batches separated by certain information.
modifications(etdb, columns = c("mod_id","mod_type","mod_name", "rx_genename","spec_genename", "ref_type","ref"), filter = list(mod_id = 1:3))
# split by sequence name, usually a transcipt identifier modificationsBy(etdb, by = "seqnames") # split modification type modificationsBy(etdb, by = "modtype")
Since epitranscriptomic modifications by their nature can have different meaning for each of the individual transcript variants. This also introduces conflicts for saving epitranscriptomics coordinates. In the example above the coordinates are given per transcript, because of the source data.
However, not all sources report transcript coordinates. It might be of interest to shift the coordinates to transcript coordinates and at the same time taking care that with transcript variants multiple options exist for each of the transcript maturation process: From one genomic coordinate, multiple transcriptomic coordinates can be spawned.
Whether this is biologically relevant or whether biological evidence does exist for each modification on each transcript cannot be guaranteed or differentiated technically depending on the methods used. This might change with the arrival of new techniques allowing for detection of modified nucleotides per individual transcript variant.
suppressPackageStartupMessages({ library(TxDb.Hsapiens.UCSC.hg38.knownGene) library(BSgenome.Hsapiens.UCSC.hg38) })
library(TxDb.Hsapiens.UCSC.hg38.knownGene) library(BSgenome.Hsapiens.UCSC.hg38)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene seqlevels(txdb) <- "chr1" bs <- BSgenome.Hsapiens.UCSC.hg38 etdb <- EpiTxDb.Hs.hg38.RMBase() tx <- exonsBy(txdb) mod <- modifications(etdb, filter = list(sn_name = "chr1")) length(mod)
In the following example we will focus on shifting the coordinates to individual
mature transcripts. However, keep in mind, that premature transcript might be
of interest as well and this can be controlled via the tx
arguments of
shiftGenomicToTranscript()
mod_tx <- shiftGenomicToTranscript(mod, tx) length(mod_tx)
Due to multiple matches for each transcript variant the number of modifications has increased.
With the we can plot the relative positions of modifications by type on
chr1
transcripts.
mod_tx <- split(mod_tx,seqnames(mod_tx)) names <- Reduce(intersect,list(names(mod_tx),names(tx))) # Getting the corresponding 5'-UTR and 3'-UTR annotations fp <- fiveUTRsByTranscript(txdb) tp <- threeUTRsByTranscript(txdb) tx <- tx[names] mod_tx <- mod_tx[names] fp_m <- match(names,names(fp)) fp_m <- fp_m[!is.na(fp_m)] tp_m <- match(names,names(tp)) tp_m <- tp_m[!is.na(tp_m)] fp <- fp[fp_m] tp <- tp[tp_m] # Getting lengths of transcripts, 5'-UTR and 3'-UTR tx_lengths <- sum(width(tx)) fp_lengths <- rep(0L,length(tx)) names(fp_lengths) <- names fp_lengths[names(fp)] <- sum(width(fp)) tp_lengths <- rep(0L,length(tx)) names(tp_lengths) <- names tp_lengths[names(tp)] <- sum(width(tp)) # Rescale modifications # CDS start is at position 1L and cds end at position 1000L from <- IRanges(fp_lengths+1L, tx_lengths - tp_lengths) to <- IRanges(1L,1000L) mod_rescale <- rescale(mod_tx, to, from) # Construct result data.frame rel_pos <- data.frame(mod_type = unlist(mcols(mod_rescale,level="within")[,"mod_type"]), rel_pos = unlist(start(mod_rescale))) rel_pos <- rel_pos[rel_pos$rel_pos < 1500 & rel_pos$rel_pos > -500,]
library(ggplot2) ggplot(rel_pos[rel_pos$mod_type %in% c("m6A","m1A","Y"),], aes(x = rel_pos, colour = mod_type)) + geom_density()
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.