require(ganalyse) knitr::opts_chunk$set( comment = "#", error = FALSE, tidy = FALSE, cache = FALSE, collapse=TRUE) # options(datatable.auto.index=FALSE)
We've developed three packages for performing differential analysis of NGS
data, namely gread
, gcount
and ganalyse
. In short,
gread enables loading or reading in the required data quickly from many different formats in which NGS data and gene annotations are available.
gcount counts the reads depending on user configuration on raw counts.
ganalyse then allows to perform differential gene expression analysis
using many methods such as limma, voomlimma (for FPKM
), edger on the
read counts.
In this vignette, we'll discuss the gread
package.
ganalyse
is an R package that aims to make analysis on genomics data easy.
It is implemented only for RNA-Seq data at the moment.
The current organisation is as follows:
Provide experiment
, sample_info
and format
to the function rnaseq
to
generate an object of class fpkm
or raw
.
Collect counts using gather_counts
to load the corresponding count data
for the samples. It returns an object of class fpkm_counts
or raw_counts
respectively.
Construct design matrix
and contrasts
. See construct_design
and
construct_contrasts
.
Perform different types on analyses (only Differential Gene Expression (DGE)
analysis is supported currently) on the fpkm_counts
or raw_counts
object.
For fpkm_counts
object, use limma_dge
to perform DGE analysis. Similarly,
for raw_counts
object, use limma_dge
or edger_dge
methods.
The function rnaseq()
can be used to set up the experiment. It returns an
object of class raw
or fpkm
depending on the value provided to the
argument format
.
# ----- fpkm ----- # (fpkm_obj = rnaseq(sample_info="example/fpkm/annotation.txt", format="fpkm", experiment="test")) class(fpkm_obj) # [1] "fpkm" "data.table" "data.frame" # ----- raw ----- # (raw_obj = rnaseq(sample_info="example/raw/annotation.txt", format="raw", experiment="test")) class(raw_obj) # [1] "raw" "data.table" "data.frame"
The experiment
argument is optional with a default value is "example".
The argument sample_info
accepts a path to file containing sample info.
The columns sample
and path
should be present.
It usually also contains the treatments and groups each sample belong to. It is essential to perform DE analysis later on, although those details can be added later on.
format
can take values either fpkm
(default) or raw
.
This can be done using gather_counts
function which has methods for objects
of class fpkm
and raw
. It returns an object of class fpkm_counts
and
raw_counts
respectively.
# ----- fpkm ----- # (fpkm_counts = gather_counts(fpkm_obj, by="gene-id", log_base=2L)) class(fpkm_counts) # [1] "fpkm_counts" "fpkm" "data.table" "data.frame" # ----- raw ----- # (raw_counts = gather_counts(raw_obj, by="gene-id", threshold=1L)) class(raw_counts) # [1] "raw_counts" "raw" "data.table" "data.frame"
Other possible values for by
argument are gene-name
and transcript-id
.
It is used to identify the column by which to aggregate by to get effective
total count. For example, in case of fpkm
counts, transcript level fpkm
values are available. But if gene level DE analysis is desired, by='gene-id'
can be used in which case the total expression for each gene id would be
obtained while gathering counts.
Note that transcript-id
is only valid for fpkm
objects.
threshold
is set to the default value of 1
for raw counts. It
indicates the RPKM
value that the average RPKM value of all the samples for
each gene should be greater than for it to be retained for further analyses.
The higher this value, the more stringent the filtering.
The default threshold
value for fpkm values is 0.1
.
log_base
allows to obtain log of the raw counts / fpkm values. It is
generally recommended to use log transformed fpkm
values while using
packages that are designed to work with counts (e.g., limma
).
To inspect the counts, show_counts
function can be used. It returns a
data.table
object.
# ----- fpkm ----- # head(show_counts(fpkm_counts)) # ----- raw ----- # head(show_counts(raw_counts))
We can also generate density plots of counts facetted by the groups each
sample belongs to using density_plot()
:
# ----- ggplot2 ----- # density_plot(raw_counts, title="Density plot of raw counts", groups="condition", facet_cols=1L)
Interactive plots can be genearted using interactive=TRUE
.
# ----- plotly plot ----- # pl = density_plot(raw_counts, title="Density plot of raw counts", groups="condition", facet_cols=1L) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
We can perform DGE analysis using limma
or edgeR
bioconductor packages.
The corresponding functions are limma_dge
and edger_dge
respectively.
For raw counts, both these methods exist. For fpkm, only limma
method is
implemented.
To perform differential expression analysis, we need two have two more
things -- a) design matrix, and b) contrasts. It can be constructed using
construct_design()
and construct_contrasts()
functions.
These are simple wrappers to stats::model.matrix
and limma::makeContrasts
functions.
design = construct_design(raw_counts, formula = ~ 0 + condition) contrasts = construct_contrasts(design, A.vs.control = conditiontreatA-conditioncontrol, B.vs.control = conditiontreatB-conditioncontrol) # ----- fpkm ----- # # A vs control (fpkm_dge = limma_dge(fpkm_counts, design, contrast=contrasts[, 1])) # ----- raw ----- # # B vs control (raw_dge = limma_dge(raw_counts, design, contrast=contrasts[, "B.vs.control"]))
The usage for edger_dge
is identical. Both these functions returns a dge
object which can be directly passed to other function for generating plots
as shown below.
Volcano plot can be generated using volcano_plot()
function.
# ----- ggplot2 plot ----- # volcano_plot(fpkm_dge, title="treatA vs Control")
Interactive plots can be generated using plotly
package with the help of
the argument interactive
.
# ----- plotly plot ----- # pl = volcano_plot(raw_dge, title="treatB vs Control", interactive=TRUE) ll = htmltools::tagList() ll[[1L]] = plotly::as.widget(pl) ll
By providing a file name, the plot is saved to the file provided, and the plot object is returned.
# save to file and return the object, *not* run volcano_plot(fpkm_dge, filename="~/treatA_vs_control.png")
Have a look at ?volcano_plot
for more.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.