TSRchitect Introduction

R. Taylor Raborn and Volker P. Brendel

Department of Biology, Indiana University

July 10, 2018; updated April 7, 2019; Sept. 17, 2019.

TSRchitect is an R package for analyzing diverse types of high-throughput transcription start site (TSS) profiling datasets. TSRchitect can handle TSS profiling experiments that contain either single-end or paired-end sequence reads, such as CAGE, RAMPAGE, PEAT, STRIPE-seq and others. TSRchitect is an open-source bioinformatics package that is intended to support large-scale, reproducible analysis of TSS profiling data in a broad array of eukaryotic model systems.

Before we can begin, we must first load TSRchitect in our working environment.

library("TSRchitect")

The TSRchitect User's Guide is available; it goes through several well-documented examples of TSRchitect using different datasets.

To open the TSRchitect User's Guide on your machine, enter the following:

TSRchitectUsersGuide()

Getting started

Now that you have loaded TSRchitect, we can proceed with a small example.

First, we will create a tssObject using loadTSSobj and import our sample .bam files (which are found in extdata/). In doing this, we provide sample names and identify our replicate names using the argument sampleNames. We do this in the following manner:

extdata.dir <- system.file("extdata/bamFiles", package="TSRchitect") 

tssObjectExample <- loadTSSobj(experimentTitle="Vignette Example",
inputDir=extdata.dir, n.cores=1, isPairedBAM=TRUE,
sampleNames=c("sample1-rep1", "sample1-rep2","sample2-rep1",
"sample2-rep2"), replicateIDs=c(1,1,2,2)) #datasets 1-2 and 3-4 are replicates

Please note that TSRchitect allows input also from bed-formatted files. You could replace the above lines with following equivalents:

extdata.dir <- system.file("extdata/bedFiles", package="TSRchitect") 

tssObjectExample <- loadTSSobj(experimentTitle="Vignette Example",
inputDir=extdata.dir, n.cores=1, isPairedBED=TRUE,
sampleNames=c("sample1-rep1", "sample1-rep2","sample2-rep1",
"sample2-rep2"), replicateIDs=c(1,1,2,2)) #datasets 1-2 and 3-4 are replicates

For convenience, loadTSSobj() may also be called with argument sampleSheet="ssfile", where ssfile is either a tab-delimited text file or an EXCEL spreadsheet (with extension .xls or .xlsx). In either case, the first row must have the header SAMPLE ReplicateID FILE and the same information described above in the respective columns. If necessary, you can provide input to loadTSSobj() consisting of both .bam and .bed files, but be aware that loading of the data into the tssObject will always add the .bam files before the .bed files and the internal numbering of the TSS sets will be accordingly.

If we wish to see our new tssObject, we simply type its name on the console and hit return, as follows:

tssObjectExample

Now that the .bam files have been imported, we need to retrieve TSSs from the BAM records and calculate the abundance of each tag at a given TSS. We opt to not run these in parallel and specify this by setting n.cores = 1.

tssObjectExample <- inputToTSS(experimentName=tssObjectExample)

tssObjectExample <- processTSS(experimentName=tssObjectExample, n.cores=1,
tssSet="all", writeTable=FALSE)

Now that this is complete we can we proceed with identifing TSRs from TSSs. We do this for each of the 4 datasets we imported at once by specifying tssSet="all".

tssObjectExample <- determineTSR(experimentName=tssObjectExample, n.cores=1,
tssSetType="replicates", tssSet="all", tagCountThreshold=25, clustDist=20,
writeTable=FALSE)

Next we merge our replicate data (according to the information we provided in loadTSSobj) and identify TSRs on these merged samples. We then use addTagCountsToTSR to quantify the number of tags found in each of our 4 datasets.

tssObjectExample <- mergeSampleData(experimentName=tssObjectExample, n.cores=1,
tagCountThreshold=1)

tssObjectExample <- determineTSR(experimentName=tssObjectExample, n.cores=1,
tssSetType="merged", tssSet="all", tagCountThreshold=25, clustDist=20,
writeTable=FALSE)

tssObjectExample <- addTagCountsToTSR(experimentName=tssObjectExample,
tsrSetType="merged", tsrSet=1, tagCountThreshold=25, writeTable=FALSE)

Now that we have identified TSRs using determineTSR from all replicate and merged datasets, we can select individual results directly. To do this, we use getTSRdata, one of the tssObject accessor methods.

sample_1_1_tsrs <- getTSRdata(experimentName=tssObjectExample,
slotType="replicates", slot=1)

print(sample_1_1_tsrs)

We see that two distinct TSRs were identified from this small example dataset.

Finally, we import an annotation file (containing Gencode annotated transcripts) and then assocate this annotation with our small set of identified TSRs.

gff3data.dir <- system.file("extdata", package="TSRchitect") 
tssObjectExample <- importAnnotationExternal(experimentName=tssObjectExample,
fileType="gff3",
annotFile=paste(gff3data.dir,"gencode.v19.chr22.transcript.gff3",sep="/"))

tssObjectExample <- addAnnotationToTSR(experimentName=tssObjectExample,
tsrSetType="merged", tsrSet=1, upstreamDist=2000, downstreamDist=500,
feature="transcript", featureColumnID="ID", writeTable=FALSE)

Before we end, we choose to save our tssObject in an .RData file to return to at a later time:

save(tssObjectExample, file="tssObjectExample.RData")

This ends our vignette. Please see the TSRchitect User's Guide for more extensively documented examples.



Try the TSRchitect package in your browser

Any scripts or data that you put into this service are public.

TSRchitect documentation built on Nov. 8, 2020, 8:11 p.m.