## nolint start suppressPackageStartupMessages({ library(goalie) library(basejump) library(bcbioRNASeq) library(DESeq2) library(DESeqAnalysis) }) prepareTemplate() source("_setup.R") ## nolint end invisible(lapply( X = c( params[["data_dir"]], params[["results_dir"]] ), FUN = initDir ))
bcbioRNASeq
objectobject <- import(params[["bcb_file"]]) assert( is(object, "bcbioRNASeq"), validObject(object) ) print(object)
DESeqDataSet
Here we are using an S4 coercion method to convert our bcbioRNASeq
object to
a DESeqDataSet
. This prepares a gene-level RangedSummarizedExperiment
with
raw integer counts defined in the assay()
slot. Internally this uses the
DESeqDataSet()
constructor function and sets an empty design formula. The
desired design formula can be set with the design()
function.
dds <- as.DESeqDataSet(object) assert( is(dds, "DESeqDataSet"), validObject(dds) ) print(dds)
The design formula, specified with the design()
function, must contain factor
columns in colData()
/ sampleData()
.
Ensure that all relevant factor columns in sampleData()
contain valid names
(see make.names()
for details) prior to setting the design. We recommend
using the snakeCase()
function to automatically sanitize all factors into
snake case.
## > colnames(colData(dds)) design(dds) <- params[["design"]]
While it is not necessary to pre-filter low count genes before running the
DESeq2 functions, there are two reasons which make pre-filtering useful: by
removing rows in which there are very few reads, we reduce the memory size of
the dds data object, and we increase the speed of the transformation and
testing functions within DESeq2. Here we perform a minimal pre-filtering to
keep only rows that have at least 10 reads total. Note that more strict
filtering to increase power is automatically applied via independent filtering
on the mean of normalized counts within the results()
function.
## Note that this criteria can be made more stringent. ## Refer to DESeq2 vignette for examples. keep <- rowSums(counts(dds)) >= 10L dds <- dds[keep, ] print(dds)
By default, [R][] will choose a reference level for factors based on
alphabetical order. Then, if you never tell the [DESeq2][] functions which
level you want to compare against (e.g. which level represents the control
group), the comparisons will be based on the alphabetical order of the levels.
There are two solutions: you can either explicitly tell results which
comparison to make using the contrast argument (this will be shown later), or
you can explicitly set the factors levels. You should only change the factor
levels of variables in the design before running the [DESeq2][] analysis, not
during or afterward. Setting the factor levels can be done in two ways, using
either the factor()
or relevel()
functions. Generally we recommend using
relevel()
here.
## Including this code here for reference only. ## Specify the reference level (preferred). ## > colData(dds)[["group"]] <- ## > relevel(colData(dds)[["group"]], ref = "control") ## Alternatively, can explicitly define, using `factor()`. ## When using this approach, put the reference level (control) first. ## > colData(dds)[["group"]] <- ## > factor(colData(dds)[["group"]], levels = c("treatment", "control")) ## If samples have been subset, ensure that the levels match. ## > colData(dds)[["treatment"]] <- droplevels(colData(dds)[["group"]])
Now that our DESeqDataSet
is properly set up, we can move on to performing
the differential expression. The standard differential expression analysis
steps for [DESeq2][] are wrapped into a single function, DESeq()
. Results
tables are generated using the function results()
, which extracts a results
table with log2 fold changes, P values and adjusted P values. With no
additional arguments to results()
, the log2 fold change and Wald test P
value will be for the last variable in the design formula (design()
), and if
this is a factor, the comparison will be the last level of this variable over
the reference level (see previous note on factor levels). However, the order of
the variables of the design do not matter so long as the user specifies the
comparison to build a results table for, using the name or contrast arguments
of results.
dds <- DESeq(dds)
After we perform the differential expression, we need to calculate
variance-stabilized counts, which are stored in a DESeqTransform
object.
These transformed counts are useful for visualization. We currently recommend
using varianceStabilizingTransformation()
but rlog()
is a good alternate.
## Alternatively, can use `rlog()` here instead, but it is slower. dt <- varianceStabilizingTransformation(dds) assert(is(dt, "DESeqTransform")) interestingGroups(dt) <- interestingGroups(object)
For contrast argument as character vector:
## factor; numerator; denominator ## > levels(colData(dds[["group"]])) ## > help(topic = "results", package = "DESeq2") print(params[["contrasts"]])
resListUnshrunken <- Map( f = DESeq2::results, contrast = params[["contrasts"]], MoreArgs = list( "object" = dds, "alpha" = params[["alpha_threshold"]], "lfcThreshold" = params[["lfc_threshold"]] ) )
Now let's calculate shrunken log2 fold change values with
DESeq2::lfcShrink()
. We're using the "normal" shrinkage estimator (default in
DESeq2); the "apeglm" shrinkage estimator is also promising but doens't work
well with complex contrast designs.
## Refer to DESeqAnalysis package if you want to use newer apeglm instead. ## See `help(topic = "apeglmResults", package = "DESeqAnalysis")` for details. resListShrunken <- Map( f = DESeq2::lfcShrink, res = resListUnshrunken, contrast = params[["contrasts"]], MoreArgs = list( "dds" = dds, "type" = "normal", "alpha" = params[["alpha_threshold"]], "lfcThreshold" = params[["lfc_threshold"]] ) )
We performed the analysis using a BH adjusted P value cutoff of
r params[["alpha_threshold"]]
and a log fold-change (LFC) ratio cutoff of
r params[["lfc_threshold"]]
.
DESeqAnalysis
objectdeseq <- DESeqAnalysis( data = dds, transform = dt, results = resListUnshrunken, lfcShrink = resListShrunken ) saveData(deseq, dir = params[["data_dir"]])
An MA plot compares transformed counts on M
(log ratio) and A
(mean
average) scales [@Yang2002-sx].
invisible(Map( f = function(object, i) { markdownHeader(text = i, level = 4L, asis = TRUE) print(plotMa(object = object, i = i)) }, i = resultsNames(deseq), MoreArgs = list("object" = deseq) ))
A volcano plot compares significance (BH-adjusted P value) against fold change (log2) [@Cui2003-rn; @Li2014-ll]. Genes in the green box with text labels have an adjusted P value are likely to be the top candidate genes of interest.
invisible(Map( f = function(object, i) { markdownHeader(text = i, level = 4L, asis = TRUE) print(plotVolcano(object = object, i = i)) }, i = resultsNames(deseq), MoreArgs = list("object" = deseq) ))
invisible(Map( f = function(object, i) { markdownHeader(text = i, level = 4L, asis = TRUE) print(plotDegPca(object = object, i = i)) }, i = resultsNames(deseq), MoreArgs = list("object" = deseq) ))
This plot shows only differentially expressed genes on a per-sample basis. We
have scaled the data by row and used the ward.D2
method for clustering
[@Ward1963-xf].
invisible(Map( f = function(object, i, ...) { markdownHeader(text = i, level = 4L) plotDegHeatmap(object = object, i = i, ...) }, i = resultsNames(deseq), MoreArgs = list( "object" = deseq, "clusteringMethod" = "ward.D2", "scale" = "row" ) ))
Only the top up- and down-regulated genes are shown.
invisible(Map( f = markdownTables, i = resultsNames(deseq), MoreArgs = list( "object" = deseq, "n" = 20L ) ))
Subset the results into separate tables, containing all genes, differentially expressed genes in both directions, and directional tables.
export( object = deseq, con = params[["results_dir"]] )
Differentially expressed gene (DEG) tables are sorted by BH-adjusted P value, and contain the following columns:
baseMean
: Mean of the normalized counts per gene for all samples.log2FoldChange
: log2 fold change.lfcSE
: log2 standard error.stat
: Wald statistic.pvalue
: Walt test P value.padj
: BH adjusted Wald test P value, corrected for multiple comparisons.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.