\pagebreak
SPLINTER provides tools to analyze alternative splicing sites, interpret outcomes based on sequence information, select and design primers for site validiation and give visual representation of the event to guide downstream experiments.
To load the r Biocpkg("SPLINTER")
package:
library(SPLINTER)
In this example, we will be utilizing the mm9 genome for mouse. You will need to install the
appropriate package (eg. r Biocpkg("BSgenome.Mmusculus.UCSC.mm9")
) for the genome
that you will be using.
library(BSgenome.Mmusculus.UCSC.mm9) library(GenomicFeatures) bsgenome <- BSgenome.Mmusculus.UCSC.mm9
We begin with full set of available transcripts to screen from, and read it into a \Rclass{TxDb} object.
One source of this (best option to ensure compatibility) would be the GTF file that you have used for
alternative splicing analysis. For other sources of data, please refer to r Biocpkg("GenomicFeatures")
).
We then extract the coding sequences (CDS), and transcripts in general (coding and non-coding) (exons) from this object.
data_path<-system.file("extdata",package="SPLINTER") gtf_file<-paste(data_path,"/Mus_musculus.Ensembl.NCBIM37.65.partial.gtf",sep="") txdb <- makeTxDbFromGFF(file=gtf_file,chrominfo = Seqinfo(genome="mm9")) # txdb generation can take quite long, you can save the object and load it the next time # saveDb(txdb,file="txdb_hg19.sqlite") # txdb<-loadDb(file="txdb_hg19.sqlite") # extract CDS and exon information from TxDb object thecds<-cdsBy(txdb,by="tx",use.names=TRUE) theexons<-exonsBy(txdb,by="tx",use.names=TRUE)
The output file from MATS is used here, but essentially all that is needed are coordinates of the exons (target and flanking) involved in the splicing processt to be studied. For the case of exon skipping, this will include the upstream, target and downstream exons. More output types will be supported in the future.
\pagebreak
The following types of alternative splicing events are accepted:
Type of alternative splicing event | Definition ------------- | ------------- SE | Skipped exon RI | Retained intron MXE | Mutually exclusive exon A5SS | Alternative 5' splice site A3SS | Alternative 3' splice site
typeofAS="SE" mats_file<-paste(data_path,"/skipped_exons.txt",sep="") splice_data <-extractSpliceEvents(data=mats_file, filetype='mats', splicetype=typeofAS) splice_data$data[,c(1:10)]
r Biocpkg("SPLINTER")
assumes that the main identifier is ENSEMBL, however gene symbols can be added.
splice_data<-addEnsemblAnnotation(data=splice_data,species="mmusculus") # (Optional) Sorting the dataframe, if you have supporting statistical information splice_data$data<-splice_data$data[with(splice_data$data,order(FDR,-IncLevelDifference)),] head(splice_data$data[,c(1:10)])
\pagebreak
Once we have defined the events, we will pick 1 event to analyze.
single_id='Prmt5' pp<-which(grepl(single_id,splice_data$data$Symbol)) # Prmt5 has 1 record splice_data$data[pp,c(1:6)] # show all records single_record<-splice_data$data[pp[1],]
To reduce search complexity, we define the valid transcripts and coding sequences with regards to our gene of interest. We find that Prmt5 has 7 transcripts, 2 of which are coding sequences.
valid_tx <- findTX(id=single_record$ID,tx=theexons,db=txdb) valid_cds<- findTX(id=single_record$ID,tx=thecds,db=txdb)
The makeROI
function will create a list containing GRanges
objects for the splicing event.
This will help identify and construct relevant outputs later.
This list contains the following information:
\pagebreak
Type of alternative splicing | Type 1 representation | Type 2 representation (annotated only) ------------------- | ------------------------ | ------------------------- SE | isoform with event exon included | isoform with the exon skipped RI | isoform with normal exon boundaries | isoform with the intron retained MXE | isoform defined 1st (leftmost) in input | isoform defined 2nd in input A5SS | isoform with longer exon | isoform with shorter exon A3SS | isoform with longer exon | isoform with shorter exon
roi <- makeROI(single_record,type="SE") roi
At this juncture, we look for transcripts are compatible with the ROI. Compatibility is defined as having the exact cassette (matching upstream, target, downstream) exons. In the case of intron retention, this would just be the 2 exons flanking the intron.
We notice here that Prmt5 only has 1 compatible transcript involved in the event ROI, out of 7 transcripts (or 2 coding transcripts). There are no Type 2 transcripts, which means there are no annotated transcripts of Prmt5 containing the alternative event.
compatible_tx<- findCompatibleEvents(valid_tx,roi=roi,verbose=TRUE) compatible_cds<- findCompatibleEvents(valid_cds,valid_tx,roi=roi,verbose=TRUE)
\pagebreak
region_minus_exon <-removeRegion(compatible_cds$hits[[1]],roi)
# Not relevant for this Prmt5 skipped exon example region_plus_exon <-insertRegion(region_minus_exon,roi)
event<-eventOutcomeCompare(seq1=compatible_cds$hits[[1]],seq2=region_minus_exon, genome=bsgenome,direction=TRUE,fullseq=FALSE) event
\pagebreak
This function will return the DNA of the ROI, with exons separated by "[]" (Primer3 notation) and the junction marked by jstart
.
aa<-getRegionDNA(roi,bsgenome) aa
We have included a helper function to run Primer3 from within R
. You will need
to define the path to your Primer3 installation. Refer to ?callPrimer3
for more details.
primers<-callPrimer3(seq=aa$seq,sequence_target = aa$jstart,size_range='100-500')
primers[,c(1:4)]
Alternatively, primers can be entered manually with the appropriate headers.
primers <- data.frame(PRIMER_LEFT_SEQUENCE="ACTTTCGGACTCTGTGTGACT", PRIMER_RIGHT_SEQUENCE="TCATAGGCATTGGGTGGAGG", stringsAsFactors=FALSE)
As a confirmation, we can run the primers against the ROI to give the genomic location of the primer coverage.
cp<-checkPrimer(primers[1,],bsgenome,roi) cp
getPCRsizes
will give you the length of the PCR product produced by the set of primers.
pcr_result1<-getPCRsizes(cp,theexons) pcr_result1 tx_minus_exon <-removeRegion(compatible_tx$hits[[1]],roi) pcr_result2<-getPCRsizes(cp,tx_minus_exon) pcr_result2
While getPCRsizes
will return all possible PCR products for a given set of annotation,
splitPCRhit
will return PCR product sizes that are relevant to the splicing event in question.
relevant_pcr_bands<-splitPCRhit(pcr_result1,compatible_tx) relevant_pcr_bands
# reading in BAM files mt<-paste(data_path,"/mt_chr14.bam",sep="") wt<-paste(data_path,"/wt_chr14.bam",sep="") # Plotting genomic range, read density and splice changes eventPlot(transcripts=theexons,roi_plot=roi,bams=c(wt,mt),names=c('wt','mt'), annoLabel=single_id,rspan=2000) # Barplot of PSI values if provided psiPlot(single_record)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.