require(gread)
knitr::opts_chunk$set(
  comment = "#",
  error = FALSE,
  tidy = FALSE,
  cache = FALSE,
  collapse=TRUE)
# options(datatable.auto.index=FALSE)

We've developed three packages for performing differential analysis of NGS data, namely gread, gcount and ganalyse. In short,

In this vignette, we'll discuss the gread package.


gread package

gread is an R-package that allows to read and process gene annotations and NGS data. Currently supported formats include:

You can use the function supported_formats() to check all available formats.

supported_formats()

Reading files quickly

There are four functions, one for each type of format.

read_gtf

(gtf = read_gtf("sample.gtf"))
class(gtf)

read_gff

(gff = read_gff("sample.gff"))
class(gff)

read_bed

(bed = read_bed("sample.bed")) ## bed9, bed12 and bed15 formats
class(bed)

read_bam

(bam = read_bam("sample.bam"))
class(bam)

{.bs-callout .bs-callout-info}

read_format

While these functions are quite easy and straightforward to use, there's another function read_format() that makes things even simpler and is sufficient in most cases.

# gtf file
(gtf = read_format("sample.gtf"))

# gff file
(gff = read_format("sample.gff"))

# bam file
(bam = read_format("sample.bam"))

# bed file
(bed = read_format("sample.bed"))

{.bs-callout .bs-callout-info}

Extracting various features

The gread() package provides an useful function extract() to extract useful features from gtf and gff objects. This vignette provides a short description of the function. See ?extract for more.

In a typical RNA-seq experiment, we would want to extract the genes start-end coordinates and also just the exons. This is because, the default methodology is to extract the reads that overlap exons and count the total number of reads that overlap only those exonic regions for that gene. Extracting genes and exons into separate data.tables makes things easier to obtain counts then using the gcount package.

gtf = read_format("sample.gtf")
## Extract a unified gene model by combining all overlapping exons
(uni_genes = extract(gtf, feature="gene_exon", type="union", 
                 transcript_id="transcript_id", gene_id="gene_id"))
## extract genes as such
(genes = extract(gtf, feature="gene", type="default"))

Other useful functionalities

Construct intron coordinates

Sometimes the gene annotation formats do not have intronic coordintes annotated but might be essential. In that case, we can use the construct_introns function on a gtf or gff object a follows:

gtf = read_format("sample.gtf")
(gtf = construct_introns(gtf, update=TRUE))
(introns = construct_introns(gtf, update=FALSE))

class(gtf)
class(introns)

Note that: {.bs-callout .bs-callout-info}




openanalytics/gread documentation built on May 24, 2019, 2:29 p.m.