\pagebreak

Introduction

SPLINTER provides tools to analyze alternative splicing sites, interpret outcomes based on sequence information, select and design primers for site validiation and give visual representation of the event to guide downstream experiments.

Loading the package

To load the r Biocpkg("SPLINTER") package:

library(SPLINTER)

Initializing the genome for transcript selection

In this example, we will be utilizing the mm9 genome for mouse. You will need to install the appropriate package (eg. r Biocpkg("BSgenome.Mmusculus.UCSC.mm9")) for the genome that you will be using.

library(BSgenome.Mmusculus.UCSC.mm9)
library(GenomicFeatures)
bsgenome <- BSgenome.Mmusculus.UCSC.mm9

We begin with full set of available transcripts to screen from, and read it into a \Rclass{TxDb} object. One source of this (best option to ensure compatibility) would be the GTF file that you have used for alternative splicing analysis. For other sources of data, please refer to r Biocpkg("GenomicFeatures")).

We then extract the coding sequences (CDS), and transcripts in general (coding and non-coding) (exons) from this object.

data_path<-system.file("extdata",package="SPLINTER")
gtf_file<-paste(data_path,"/Mus_musculus.Ensembl.NCBIM37.65.partial.gtf",sep="")

library(txdbmaker)
txdb <- makeTxDbFromGFF(file=gtf_file,chrominfo = Seqinfo(genome="mm9"))

# txdb generation can take quite long, you can save the object and load it the next time
# saveDb(txdb,file="txdb_hg19.sqlite")
# txdb<-loadDb(file="txdb_hg19.sqlite")

# extract CDS and exon information from TxDb object
thecds<-cdsBy(txdb,by="tx",use.names=TRUE)
theexons<-exonsBy(txdb,by="tx",use.names=TRUE)

Reading in the splicing analysis file

The output file from MATS is used here, but essentially all that is needed are coordinates of the exons (target and flanking) involved in the splicing processt to be studied. For the case of exon skipping, this will include the upstream, target and downstream exons. More output types will be supported in the future.

\pagebreak

The following types of alternative splicing events are accepted:

Type of alternative splicing event | Definition ------------- | ------------- SE | Skipped exon RI | Retained intron MXE | Mutually exclusive exon A5SS | Alternative 5' splice site A3SS | Alternative 3' splice site

typeofAS="SE"
mats_file<-paste(data_path,"/skipped_exons.txt",sep="")
splice_data <-extractSpliceEvents(data=mats_file, filetype='mats', splicetype=typeofAS)
splice_data$data[,c(1:10)]

Additional annotation

r Biocpkg("SPLINTER") assumes that the main identifier is ENSEMBL, however gene symbols can be added.

splice_data<-addEnsemblAnnotation(data=splice_data,species="mmusculus")

# (Optional) Sorting the dataframe, if you have supporting statistical information
splice_data$data<-splice_data$data[with(splice_data$data,order(FDR,-IncLevelDifference)),]
head(splice_data$data[,c(1:10)])

\pagebreak

Analyzing a specific gene

Inspecting a single gene in more detail (single record)

Once we have defined the events, we will pick 1 event to analyze.

single_id='Prmt5'
pp<-which(grepl(single_id,splice_data$data$Symbol)) # Prmt5 has 1 record

splice_data$data[pp,c(1:6)] # show all records

single_record<-splice_data$data[pp[1],]

Finding relevant transcripts from the ENSEMBL database

To reduce search complexity, we define the valid transcripts and coding sequences with regards to our gene of interest. We find that Prmt5 has 7 transcripts, 2 of which are coding sequences.

valid_tx <- findTX(id=single_record$ID,tx=theexons,db=txdb)

valid_cds<- findTX(id=single_record$ID,tx=thecds,db=txdb)

Constructing the region of interest (ROI)

The makeROI function will create a list containing GRanges objects for the splicing event. This will help identify and construct relevant outputs later.

This list contains the following information:

\pagebreak

Type of alternative splicing | Type 1 representation | Type 2 representation (annotated only) ------------------- | ------------------------ | ------------------------- SE | isoform with event exon included | isoform with the exon skipped RI | isoform with normal exon boundaries | isoform with the intron retained MXE | isoform defined 1st (leftmost) in input | isoform defined 2nd in input A5SS | isoform with longer exon | isoform with shorter exon A3SS | isoform with longer exon | isoform with shorter exon

roi <- makeROI(single_record,type="SE")
roi

Finding transcripts that contain the ROI

At this juncture, we look for transcripts are compatible with the ROI. Compatibility is defined as having the exact cassette (matching upstream, target, downstream) exons. In the case of intron retention, this would just be the 2 exons flanking the intron.

We notice here that Prmt5 only has 1 compatible transcript involved in the event ROI, out of 7 transcripts (or 2 coding transcripts). There are no Type 2 transcripts, which means there are no annotated transcripts of Prmt5 containing the alternative event.

compatible_tx<- findCompatibleEvents(valid_tx,roi=roi,verbose=TRUE)

compatible_cds<- findCompatibleEvents(valid_cds,valid_tx,roi=roi,verbose=TRUE)

\pagebreak

Simulating alternatively spliced products

Simulating the outcome of exon skipping by removing an exonic region

region_minus_exon <-removeRegion(compatible_cds$hits[[1]],roi)

Simulating the outcome of intron retention by inserting an intronic region

# Not relevant for this Prmt5 skipped exon example
region_plus_exon <-insertRegion(region_minus_exon,roi)

Comparing sequences before and after removal/insertion of a region

event<-eventOutcomeCompare(seq1=compatible_cds$hits[[1]],seq2=region_minus_exon,
                    genome=bsgenome,direction=TRUE,fullseq=FALSE)

event

\pagebreak

Designing primers to inspect splicing regions

Getting the DNA of the region of interest

This function will return the DNA of the ROI, with exons separated by "[]" (Primer3 notation) and the junction marked by jstart.

aa<-getRegionDNA(roi,bsgenome) 
aa

Using Primer3 to design primers for alternative splicing identification

We have included a helper function to run Primer3 from within R. You will need to define the path to your Primer3 installation. Refer to ?callPrimer3 for more details.

primers<-callPrimer3(seq=aa$seq,sequence_target = aa$jstart,size_range='100-500')
primers[,c(1:4)]

Alternatively, primers can be entered manually with the appropriate headers.

primers <- data.frame(PRIMER_LEFT_SEQUENCE="ACTTTCGGACTCTGTGTGACT",
                      PRIMER_RIGHT_SEQUENCE="TCATAGGCATTGGGTGGAGG",
                      stringsAsFactors=FALSE)

Checking primers coverage

As a confirmation, we can run the primers against the ROI to give the genomic location of the primer coverage.

cp<-checkPrimer(primers[1,],bsgenome,roi)

cp

Predicting PCR results using the primers

getPCRsizes will give you the length of the PCR product produced by the set of primers.

pcr_result1<-getPCRsizes(cp,theexons)
pcr_result1

tx_minus_exon <-removeRegion(compatible_tx$hits[[1]],roi)
pcr_result2<-getPCRsizes(cp,tx_minus_exon)
pcr_result2

Selecting sizes relevant to splicing event (subset of getPCRsizes)

While getPCRsizes will return all possible PCR products for a given set of annotation, splitPCRhit will return PCR product sizes that are relevant to the splicing event in question.

relevant_pcr_bands<-splitPCRhit(pcr_result1,compatible_tx)

relevant_pcr_bands

Plotting results

# reading in BAM files
mt<-paste(data_path,"/mt_chr14.bam",sep="")
wt<-paste(data_path,"/wt_chr14.bam",sep="")

# Plotting genomic range, read density and splice changes
eventPlot(transcripts=theexons,roi_plot=roi,bams=c(wt,mt),names=c('wt','mt'),
          annoLabel=single_id,rspan=2000)

# Barplot of PSI values if provided
psiPlot(single_record)

Session info

sessionInfo()


dianalow/SPLINTER documentation built on March 28, 2024, 2:07 p.m.