# Last modified 2018-03-17 bcbioRNASeq::prepareRNASeqTemplate() source("_setup.R") # Directory paths ============================================================== invisible(mapply( FUN = dir.create, path = c(params$data_dir, params$results_dir), MoreArgs = list(showWarnings = FALSE, recursive = TRUE) )) # Load object ================================================================== bcb_name <- load(params$bcb_file) bcb <- get(bcb_name, inherits = FALSE) stopifnot(is(bcb, "bcbioRNASeq")) invisible(validObject(bcb))
dds <- as(bcb, "DESeqDataSet") design(dds) <- params$design dds <- DESeq(dds) rld <- rlog(dds)
Let's take a look at the number of genes we get with different false discovery rate (FDR) cutoffs. These tests subset P values that have been multiple test corrected using the Benjamini Hochberg (BH) method [@Benjamini:1995ws].
alphaSummary(dds)
# help("results", "DESeq2") # For contrast argument as character vector: # 1. Design matrix factor of interest. # 2. Numerator for LFC (expt). # 3. Denominator for LFC (control). res_unshrunken <- results( dds, contrast = params$contrast, alpha = params$alpha ) # DESeqResults with shrunken log2 fold changes (LFC) # help("lfcShrink", "DESeq2") # Use the correct `coef` number to modify from `resultsNames(dds)` res_shrunken <- lfcShrink( dds = dds, coef = 2, res = res_unshrunken ) # Use shrunken LFC values by default res <- res_shrunken saveData(res, res_shrunken, res_unshrunken, dir = params$data_dir)
We performed the analysis using a BH adjusted P value cutoff of r params$alpha
and a log fold-change (LFC) ratio cutoff of r params$lfc
.
An MA plot compares transformed counts on M
(log ratio) and A
(mean average) scales [@Yang:2002ty].
plotMeanAverage(res) # Alternate plot # DESeq2::plotMA(res)
A volcano plot compares significance (BH-adjusted P value) against fold change (log2) [@Cui:2003kh; @Li:2014fv]. Genes in the green box with text labels have an adjusted P value are likely to be the top candidate genes of interest.
plotVolcano(res, lfc = params$lfc)
This plot shows only differentially expressed genes on a per-sample basis. We have scaled the data by row and used the ward.D2
method for clustering [@WardJr:1963eu].
# help("pheatmap", "pheatmap") plotDEGHeatmap( res, counts = rld, clusteringMethod = "ward.D2", scale = "row" )
res_tbl <- resultsTables( results = res, counts = dds, lfcThreshold = params$lfc, write = TRUE, summary = TRUE, headerLevel = 2, dir = params$results_dir, dropboxDir = params$dropbox_dir ) saveData(res_tbl, dir = params$data_dir)
Differentially expressed gene (DEG) tables are sorted by BH-adjusted P value, and contain the following columns:
baseMean
: Mean of the normalized counts per gene for all samples.log2FoldChange
: log2 fold change.lfcSE
: log2 standard error.stat
: Wald statistic.pvalue
: Walt test P value.padj
: BH adjusted Wald test P value (corrected for multiple comparisons; aka FDR).Only the top up- and down-regulated genes (arranged by log2 fold change) are shown.
topTables(res_tbl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.