This is an R Markdown document for NOISeq analysis of IDEA: Interactive Differential Expression Analyzer. Plots in NOISeq analysis module (plotted in R [1] with pheatmap[2] (for heat map) and ggplot2[3] (for probability distribution plot with q-value)) are presented in HTML file via rmarkdown [4]. For figures of higher resolution, please download from website directly.
Citation: This work is in process of publishing, citation method will be post here as soon as possible. Check out the IDEA website above.
knitr::opts_chunk$set(fig.width = 9, fig.height = 9, dpi = 72)
#setwd(tempdir()) load("NOIseqAnalysis.RData") #p1 basic information #exprimental design plist[[1]][1] #select paird plist[[1]][[2]] #NOISeq Normalized Method plist[[1]][[3]] #qvalue threshold plist[[1]][[4]] ##heatmap top nunmber plist[[1]][[5]] library(ggplot2) library(gplots) library(RColorBrewer) library(scales) library(pheatmap) library(plyr) library(labeling) library(stringr) # library(NOISeq) library(rmarkdown) #library(S4Vectors) library(stringr)
Count data, as generated by various high-throughput sequencing methods such as RNA-Seq [5, 6], Tag-Seq [7, 8], and ChIP-Seq[9], has been more and more used to represent the abundance of genes/features at RNA/DNA level since read count and abundance are linearly related[6]. Also in RNA-Seq, variation caused by replicate is low, which makes RNA-Seq count data advantageous for differential expressed gene discovery[10]. Differential expression (DE) analysis typically works with following questions: choice of normalization and noise control method [6, 8]; choice of data distribution given numbers of replicates[8]; choice of assessment of statistical significance of DE detection[11].
NOISeq[12] is an R/Bioconductor package for differential expression analysis for count data. It adopts the non-parametric method to model count data distribution, which typically holds better performance when a relatively large data set is available. NOISeq is capable of handling data with technical replicates (NOISeq-real), biological replicates (NOISeqBIO) or no replicates (NOISeq-sim), though the last option is not recommended. Several normalization methods are available in NOISeq, including the reads per kilobase per million reads (RPKM) [6], the Trimmed Mean of M (TMM) [13] and the Upper Quartile (UQ) [14], with RPKM as default. In NOISeqBIO, the counts per million reads (CPM) is used to filter features with low counts. For a certain feature, a probability of being differentially expressed is calculated by comparing the log2-ratio of absolute read counts between two conditions against the noise distribution. The feature is considered as differentially expressed when the probability is above a defined threshold (q-value).
In IDEA, NOISeq, version 2.8.0, is employed for DE analysis. For more information on NOISeq, please refer to the reference [12] and package manual.
In IDEA, a raw count table and an experimental design table should be inputted. Optionally, experimental design can be one of Standard Comparison, Multi-factors Design and Without Replicates (not recommended). Then a pair of conditions should be selected to carry out DE analysis.
Specifically, PoissonSeq is applicable only for Standard Comparison and Without Replicates.
In this case, experimental design was stated as r plist[[1]][1]
. Condition r as.character(plist[[1]][[2]])[1]
and condition r as.character(plist[[1]][[2]])[2]
were selected for differential expression analysis.
NOISeq provides three methods for normalization: RPKM, TMM and UQ, whose basic information is listed in Table 1. In this case, is adopted as normalization method.
In this case, normalization method was set as r plist[[1]][[3]]
.
htmltools::HTML(' <div align="center"> Table 1 Normalization methods in NOISeq<br/> <table cellpadding="5" cellspacing="0" border="1" frame=hsides rules=all style="border-color: #000000"> <tr> <td style="border-width: medium thin medium 0"> Method</td> <td style="border-width: medium thin medium 0"> Abbreviation</td> <td style="border-width: medium thin medium 0"> Summary</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> The Reads per Kilobase per Million Reads (default) [<a href="#ref6">6</a>]</td> <td style="border-width: 0 thin thin 0"> RPKM</td> <td style="border-width: 0 thin thin 0"> Counts per kilobase per million mapped reads or total number of reads in library calculated</td> </tr> <tr> <td style="border-width: 0 thin thin 0"> The Trimmed Mean of M values[<a href="#ref13">13</a>]</td> <td style="border-width: 0 thin thin 0"> TMM</td> <td style="border-width: 0 thin thin 0"> Weight taken from delta method on binomial data, then trimmed weighted means calculated</td> </tr> <tr> <td style="border-width: 0 thin medium 0"> The Upper Quartile[<a href="#ref14">14</a>]</td> <td style="border-width: 0 thin medium 0"> UQ</td> <td style="border-width: 0 thin medium 0"> Features that are zero in all library removed, scale factor calculated from a upper quartile of counts for each library</td> </tr> </table> </div> ')
After analysis, a table containing information of all diffientially expressed genes is presented with interactive options. Implication of nouns in header is explained in Table 2.
Note that in different packages, same noun in header can have different implication. For example, p-values in DESeq are obtained by Wald test, but in edgeR p-values are obtained by Fisher's exact test.
Headers | Interpretation |
FeatureID | Feature identifier |
Mean | Mean of condition, available for multiple columns |
Theta | Differential expression statistics |
Prob | Probability of differential expression |
Log2FC | Logarithm (base 2) of the fold change, fold change is defined as counts of Condition1 divided by counts of Condition2 |
Heat map can graphically display the differential expression table, and each square (pixel) represents the value of a feature in a sample and colored accordingly. Here, heat map of differential expressed features is plotted via R package pheatmap. Features are arranged in columns (samples) and rows (features) as in the original data matrix. Up-regulated differential expression features are colored red in heat map, while the down-regulated colored green. Hierarchical clustering results of features and samples are shown in dendrogram on the left and upper side of heat map, respectively.
Numbers of features to display as rows, the appearance of dendrogram on both left and upper side, and the appearance of color key are all interactively changeable. The data scaling of heat map can be one of "none", "row", and "column", as chosen by user. The color is scaled by $log_{10}(Normalized Reads Count + 1)$.
In this case, data is centered and scaled in the r as.character(plist[[2]][[3]])
direction. For more information on parameter settings, please refer to the manual of package pheatmap (as in References [2]).
Note that in NOISeq, probability is not equivalent to p-value.
According to probability calculation process, summarized above in Introduction of NOISeq, the higher probability is, the more like that the feature is differentially expressed due to changes in experimental condition. As default, q-value is given as a threshold to select DE features and is set as 0.8. For more details, please refer to NOISeq reference [12 and manual.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.