BiocStyle::markdown(css.files = c('custom.css'))
7-methyl guanosine (m7G), 3-methyl cytidine (m3C) and Dihydrouridine (D) are commonly found in rRNA and tRNA and can be detected classically by primer extension analysis. However, since the modifications do not interfere with Watson-Crick base pairing, a specific chemical treatment needs to be employed to cause strand breaks specifically at the modified positions. Initially, this involved a sodium borhydride treatment to create abasic sites and cleaving the RNA at abasic sites with aniline.
This classical protocol was converted to a high throughput sequencing method call AlkAnilineSeq and allows modified position be detected by an accumulation of 5'-ends at the N+1 position [@Marchand.2018]. It was found, that m3C is susceptible to this treatment, which allows m7G, m3C and D to be detected by the same method from the same data sets, since the identify of the unmodified nucleotide informs about the three modified nucleotides.
The ModAlkAnilineSeq
class uses the the NormEnd5SequenceData
class to store
and aggregate data along the transcripts. The calculated scores follow the
nomenclature of [@Marchand.2018] with the names scoreNC
(default) and scoreSR
.
suppressPackageStartupMessages({ library(rtracklayer) library(GenomicRanges) library(RNAmodR.AlkAnilineSeq) library(RNAmodR.Data) })
library(rtracklayer) library(GenomicRanges) library(RNAmodR.AlkAnilineSeq) library(RNAmodR.Data)
The example workflow is limited to 18S rRNA and some tRNA from S.cerevisiae.
As annotation data either a gff file or a TxDb
object and for sequence data
a fasta file or a BSgenome
object can be used. The data is provided as bam
files.
annotation <- GFF3File(RNAmodR.Data.example.AAS.gff3()) sequences <- RNAmodR.Data.example.AAS.fasta() files <- list("wt" = c(treated = RNAmodR.Data.example.wt.1(), treated = RNAmodR.Data.example.wt.2(), treated = RNAmodR.Data.example.wt.3()), "Bud23del" = c(treated = RNAmodR.Data.example.bud23.1(), treated = RNAmodR.Data.example.bud23.2()), "Trm8del" = c(treated = RNAmodR.Data.example.trm8.1(), treated = RNAmodR.Data.example.trm8.2()))
The analysis is triggered by the construction of a ModSetAlkAnilineSeq
object.
Internally parallelization is used via the BiocParallel
package, which would
allow optimization depending on number/size of input files (number of samples,
number of replicates, number of transcripts, etc).
msaas <- ModSetAlkAnilineSeq(files, annotation = annotation, sequences = sequences) msaas
As expected the m7G1575 is missing from the Bud23del samples.
mod <- modifications(msaas) lapply(mod,head, n = 2L)
As outlined in the RNAmodR
package we can compare the samples using the
plotCompareByCoord
to prepare a heatmap. For this we select some position
from the found modifications. In addition we prepare an alias table.
coord <- mod[[1L]] alias <- data.frame(tx_id = c(1L,3L,5L,6L,7L,8L,10L,11L), name = c("18S rRNA","tF(GAA)B","tG(GCC)B","tT(AGT)B", "tQ(TTG)B","tC(GCA)B","tS(CGA)C","tV(AAC)E1"), stringsAsFactors = FALSE)
plotCompareByCoord(msaas, coord, score = "scoreSR", alias = alias, normalize = TRUE)
plotCompareByCoord(msaas, coord[1L], score = "scoreSR", alias = alias)
In addition, the aggregate data along the transcript visualized as well.
plotData(msaas, "1", from = 1550L, to = 1600L)
This includes raw data as well.
plotData(msaas[1L:2L], "1", from = 1550L, to = 1600L, showSequenceData = TRUE)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.