In roryk/bcbioRnaseq: RNA-Seq Utilities

# Last modified 2018-03-17
bcbioRNASeq::prepareRNASeqTemplate()
source("_setup.R")

# Directory paths ==============================================================
invisible(mapply(
    FUN = dir.create,
    path = c(params$data_dir, params$results_dir),
    MoreArgs = list(showWarnings = FALSE, recursive = TRUE)
))

# Load object ==================================================================
bcb_name <- load(params$bcb_file)
bcb <- get(bcb_name, inherits = FALSE)
stopifnot(is(bcb, "bcbioRNASeq"))
invisible(validObject(bcb))

dds <- as(bcb, "DESeqDataSet")
design(dds) <- params$design
dds <- DESeq(dds)
rld <- rlog(dds)

Alpha level (FDR) cutoffs

Let's take a look at the number of genes we get with different false discovery rate (FDR) cutoffs. These tests subset P values that have been multiple test corrected using the Benjamini Hochberg (BH) method [@Benjamini:1995ws].

alphaSummary(dds)

Results

# help("results", "DESeq2")
# For contrast argument as character vector:
#   1. Design matrix factor of interest.
#   2. Numerator for LFC (expt).
#   3. Denominator for LFC (control).
res_unshrunken <- results(
    dds,
    contrast = params$contrast,
    alpha = params$alpha
)

# DESeqResults with shrunken log2 fold changes (LFC)
# help("lfcShrink", "DESeq2")
# Use the correct `coef` number to modify from `resultsNames(dds)`
res_shrunken <- lfcShrink(
    dds = dds,
    coef = 2,
    res = res_unshrunken
)

# Use shrunken LFC values by default
res <- res_shrunken
saveData(res, res_shrunken, res_unshrunken, dir = params$data_dir)

We performed the analysis using a BH adjusted P value cutoff of r params$alpha and a log fold-change (LFC) ratio cutoff of r params$lfc.

Plots

Mean average (MA)

An MA plot compares transformed counts on M (log ratio) and A (mean average) scales [@Yang:2002ty].

plotMeanAverage(res)

# Alternate plot
# DESeq2::plotMA(res)

Volcano

A volcano plot compares significance (BH-adjusted P value) against fold change (log2) [@Cui:2003kh; @Li:2014fv]. Genes in the green box with text labels have an adjusted P value are likely to be the top candidate genes of interest.

plotVolcano(res, lfc = params$lfc)

Heatmap

This plot shows only differentially expressed genes on a per-sample basis. We have scaled the data by row and used the ward.D2 method for clustering [@WardJr:1963eu].

# help("pheatmap", "pheatmap")
plotDEGHeatmap(
    res,
    counts = rld,
    clusteringMethod = "ward.D2",
    scale = "row"
)

Results tables

res_tbl <- resultsTables(
    results = res,
    counts = dds,
    lfcThreshold = params$lfc,
    write = TRUE,
    summary = TRUE,
    headerLevel = 2,
    dir = params$results_dir,
    dropboxDir = params$dropbox_dir
)
saveData(res_tbl, dir = params$data_dir)

Differentially expressed gene (DEG) tables are sorted by BH-adjusted P value, and contain the following columns:

baseMean: Mean of the normalized counts per gene for all samples.
log2FoldChange: log2 fold change.
lfcSE: log2 standard error.
stat: Wald statistic.
pvalue: Walt test P value.
padj: BH adjusted Wald test P value (corrected for multiple comparisons; aka FDR).