knitr::opts_chunk$set( echo = TRUE )
library("DeeDee") data(DE_results_IFNg_naive, package = "DeeDee") IFNg_naive <- deedee_prepare(IFNg_naive, "DESeq2") data(DE_results_IFNg_both, package = "DeeDee") IFNg_both <- deedee_prepare(IFNg_both, "DESeq2") data(DE_results_Salm_naive, package = "DeeDee") Salm_naive <- deedee_prepare(Salm_naive, "DESeq2") data(DE_results_Salm_both, package = "DeeDee") Salm_both <- deedee_prepare(Salm_both, "DESeq2") DeeDee_obj <- list(IFNg_naive = IFNg_naive, IFNg_both = IFNg_both, Salm_naive = Salm_naive, Salm_both = Salm_both)
When you want to compare results from multiple Differential Expression Analyses (DEAs), the DeeDee package is your friend. It contains various functions, and a Shiny App combining them all, that help shed light on the similarities and differences between the experiments in question. DeeDee is designed to be used after the application of a DE analysis program (like DESeq2, edgeR or limma) on the single DEAs.
You can install the current development version from GitHub:
library("remotes") remotes::install_github("lea-rothoerl/DeeDee", dependencies = TRUE, build_vignettes = TRUE)
The examples in the following chapters utilize the data from the Bioconductor package macrophage
(Human macrophage immune response). The macrophage dataset includes data from 24 RNA-seq samples of human macrophages exposed to different conditions: naive, associated with IFNg (an interferon), SL1344 (a strain of Salmonella), or both. The preprocessing of the data was done using DESeq2, obtaining the following four DEAs: naive vs. IFNg, IFNg vs. both, naive vs. Salmonella, and Salmonella vs. both.
The main DeeDee functions work on tables of logFC- ("logFC") and p-values ("pval") from DEAs, with the gene identifiers as row names. This table can be built by the user manually or by using deedee_prepare
. The deedee_prepare
function accepts results from three of the most used R DE analysis packages (DESeq2, edgeR, limma) and converts them to digestible tables. Which of these packages was used to analyze the raw data needs to be specified using the parameter input_type
. If, for example, you have a DESeq2 result called DESeq2_res
and want to convert it to a DeeDee table, you can use the following code chunk to do so.
inp <- deedee_prepare(data = DESeq2_res, input_type = "DESeq2")
Every DeeDee main function needs to be fed with a (named) list of at least two of these tables as input (data
). A threshold for p-values can be specified with the parameter pthresh
(default = 0.05). Besides these, most functions have additional parameters that will be explained in the respective sections below. All functions produce colorblind-friendly output.
For our example, the deedee_prepare
results are inp1
- inp4
(naive vs. IFNg, IFNg vs. both, naive vs. Salmonella, and Salmonella vs. both from macrophage). The named input list is created as follows.
DeeDee_obj <- list(naive_IFNg = inp1, IFNg_both = inp2, naive_Salm = inp3, Salm_both = inp4)
When DESeq2
is applied to analyze the raw data, deedee_prepare
accepts the output from the results()
function.
When edgeR
is applied to analyze the raw data, deedee_prepare
accepts the output from the topTable()
function.
When limma
is applied to analyze the raw data, deedee_prepare
accepts the output from the exactTest()
function.
\pagebreak
The function deedee_scatter()
creates a scatterplot of logFC values of the genes in two input datasets. If the input DeeDee list contains more than two datasets (like our example DeeDee_obj
does), the select1
(default = 1) and select2
(default = 2) parameters specify which ones will be plotted (selected by list index). deedee_scatter()
includes a color_by
parameter that can be set to "pval1" (default) or "pval2". The points will be colored according to the color scheme given on the right side of the output.
deedee_scatter(data = DeeDee_obj, select1 = 2, select2 = 3, color_by = "pval1", pthresh = 0.05)
\pagebreak
The function deedee_heatmap()
creates a heatmap of the logFC values for all common genes in every input dataset. The color key is given on the right side of the plot. Additionally to the standard parameters explained in Input, the function takes a numeric show_first
value (default = 25), specifying the number of genes depicted (if the total number of genes is smaller than show_first
, all genes are shown). It also digests a logical show_gene_names
value (default = FALSE) that determines if the gene identifiers (rownames in deedee_prepare
results) will be displayed in the heatmap, and a logical show_na
, defining if genes with NAs (in less than half of the contrasts) are included. The distance measure (dist
, values: euclidean
(default), manhattan
, pearson
, spearman
) and clustering method (clust
, values: single
, complete
, average
(default), centroid
) can be chosen as well.
deedee_heatmap(data = DeeDee_obj, show_first = 25, show_gene_names = FALSE, dist = "manhattan", clust = "centroid", show_na = FALSE, pthresh = 0.05)
Because the heatmap is produced with the InteractiveComplexHeatmap package, a Shiny window with the heatmap and entailed interactivity can be opened by executing the command InteractiveComplexHeatmap::ht_shiny(res)
to a result res
of deedee_heatmap()
. For more information, please refer to the InteractiveComplexHeatmap Vignette.
\pagebreak
The function deedee_venn()
creates a Venn diagram depicting the overlaps of differentially expressed genes in the input datasets. To keep the Venn diagram easy on the eye, the data
list may contain no more than four datasets. To compare more DEAs in a similar, set-based manner, please make use of deedee_upSet()
. The parameter mode
can be set to up
, down
, or both
(default), specifying if only up-regulated, down-regulated or both DE genes will be counted.
# deedee_venn(data = DeeDee_obj, mode = "both", pthresh = 0.05)
\pagebreak
The function deedee_upset()
creates an UpSet plot depicting the overlaps of differentially expressed genes in the input datasets. Contrary to deedee_venn()
, the UpSet plot can compare multiple (and even more than four) datasets in a visually pleasing way. The parameter mode
can be set to up
, down
, both
, or both_colored
(default), specifying if only up-regulated, down-regulated or both DE genes shall be depicted. If both_colored
is chosen, the result is the same UpSet plot as for both
, but the shares of the intersections where all samples have positive/negative logFC values are colored accoring to the color key. The minimum size for an intersection to be included in the plot can be defined via the parameter min_setsize
, default is 10.
deedee_upset(data = DeeDee_obj, mode = "both_colored", min_setsize = 15, pthresh = 0.05)
\pagebreak
The function deedee_qq()
compares the statistical distributions of two input datasets. If the input data list contains more than two datasets (like our example DeeDee_obj
does), the select1
(default = 1) and select2
(default = 2) parameters specify which ones are to be used. If the resulting curve resembles a straight line with a slope of 1, the distributions are similar (perfect straight line = identical distributions). deedee_qq()
includes a color_by
parameter that can be set to pval1
(default) or pval2
. The points generating the curve will be colored according to the color key displayed on the right of the output.
deedee_qq(data = DeeDee_obj, select1 = 1, select2 = 3, color_by = "pval2", pthresh = 0.05)
The function deedee_qqmult()
makes the same calculations as deedee_qq
, but it is capable of depicting multiple Q-Q lines of different contrasts against the same reference in one plot. The curves are then colored by contrast, a coloration by p-value, like what deedee_qq()
allows for, is not possible with deedee_qqmult()
.
deedee_qqmult(data = DeeDee_obj, ref = 1, pthresh = 0.05)
\pagebreak
The function deedee_cat()
creates a plot depicting Concordance At the Top curves for a reference contrast (parameter ref
, select contrast by list index). For each contrast except the reference, a curve indicating the concordance of the top genes in the contrast's logFC-sorted gene list, against the reference's, is displayed. The argument mode
specifies the way of sorting the genes: by highest (up
, default), lowest (down
) or highest absolute (both
) logFC value. The highest rank for which the concordance is calculated is given by the maxrank
(default = 1000) parameter.
deedee_cat(data = DeeDee_obj, ref = 2, maxrank = 800, pthresh = 0.05)
\pagebreak
deedee_summary()
is the function to combine the results from the other DeeDee functions as a HTML document in a report-like manner. The function entails arguments to set the parameters for all included functions (for more information see the manual). The summary will be saved to a path specified with the argument output_path
(default = "DeeDee_Summary.html" in the working directory). The parameter overwrite
decides if a potentially existing document at the designated location will be overwritten (default = TRUE). Param silent
can suppress messages (default = FALSE) and open_file
decides if the resulting document will be opened (default = TRUE).
deedee_summary(deedee_list = DeeDee_obj, output_path = "DeeDee_Summary.html", overwrite = FALSE, pthresh = 0.05, scatter_select1 = 1, scatter_select2 = 2, scatter_color_by = "pval1", heatmap_show_first = 25, heatmap_show_gene_names = FALSE, heatmap_dist = "euclidean", heatmap_clust = "average", heatmap_show_na = FALSE, venn_mode = "both", upset_mode = "both_colored", upset_min_setsize = 10, qqmult_ref = 1, cat_ref = 1, cat_maxrank = 1000, cat_mode = "up", silent = FALSE, open_file = TRUE))
\pagebreak
Besides the standalone functions, the DeeDee package also contains an interactive Shiny web application. To open it, run the following command:
deedee_app()
It can work on a diverse range of input files (.txt, .xlsx, .RDS) with different contents (single DeeDee tables, lists of DeeDee tables, raw DEA result objects), as described in the INFO
panel in the input tab.
The App entails one tab for each of the above presented functions. The respective parameters can be set via self-explanatory input boxes in each tab. Additional interactive functionality, namely the interactive heatmap, a brushing (area selection) of points in the Q-Q and scatter plot and the possibility to run an over-representation on the brushed points in the scatter plot are implemented.
More information on the usage can be found in the INFO
panel in each respective tab.
Optionally, you can include a list of DeeDee tables as an argument to the deedee_app(data = DeeDee_obj)
function. The input can be combined with uploads and used just like these.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.