DiffCircaPipeline is a workflow for a systematic detection of multifaceted differential circadian characteristics ($\Delta A$, $\Delta \phi$, $\Delta M$, $\Delta R^2$) with accurate false positive control. The pipeline allows interactive exploration of the data and the analysis outputs are accompanied by informative visualization. This tutorial demonstrates the pipeline using an public data set from Gene Expression Omnibus (GEO) with the accession number GSE54650.
knitr::opts_chunk$set(echo = TRUE) library(dplyr)
GSE54650 contains data from 12 tissues. In this tutorial we will be comparing rhythmicity between two tissues: hypothalamus and skeletal muscle.
GSE.ID = "GSE54650" if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") if(!require("GEOquery", quietly = TRUE)) BiocManager::install("GEOquery") if(!require("Biobase", quietly = TRUE)) BiocManager::install("Biobase") Sys.setenv(VROOM_CONNECTION_SIZE=131072*5) gset <- GEOquery::getGEO(GSE.ID) data.pdata = Biobase::pData(gset[[1]]) data.fdata = Biobase::fData(gset[[1]]) data.exprs = Biobase::exprs(gset[[1]]) table(data.pdata$source_name_ch1)
gI = "hypothalamus"; gII = "skeletal muscle" data.pdata.sub = data.pdata %>% filter(source_name_ch1 %in% c(gI, gII)) data.exprs.sub = data.exprs[, data.pdata.sub$geo_accession] #check if need log2 transformation ex = data.exprs.sub qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=TRUE)) LogC <- (qx[5] > 100) || (qx[6]-qx[1] > 50 && qx[2] > 0) if (LogC) { ex[ex == 0] <- NaN ex <- log2(ex) } # box-and-whisker plot par(mar=c(7,4,2,1)) title <- paste0(GSE.ID, ": box-and-whisker plot") boxplot(ex, boxwex=0.7, notch=TRUE, main=title, outline=FALSE, las=2,)#distribution looks good # expression value distribution plot par(mar=c(4,4,2,1)) title <- paste0(GSE.ID, ": density distribution") limma::plotDensities(ex, main=title, legend=FALSE)
There is an apparant distribution difference between the two tissues, so we perform quantile normalization to the data.
ex.norm = limma::normalizeQuantiles(ex) limma::plotDensities(ex.norm, main=title, legend=FALSE)
This section performs the differential rhythmicity (DR) tests using DiffCircaPipeline.
Format the data for DiffCircaPipeline. The data for each group is a list of three components: (1) the expression data, which must be formatted into dataframe; (2) the time, which should have the same order as the columns in the expression, and (3) gname, which is a vector of feature identifier, the order must be the same as the rows in the expression.
data.pdata.x1 = data.pdata.sub %>% filter(source_name_ch1==gI) %>% mutate(time = as.numeric(gsub(".*_CT([0-9]+)$", "\\1", title))) data.pdata.x2 = data.pdata.sub %>% filter(source_name_ch1==gII) %>% mutate(time = as.numeric(gsub(".*_CT([0-9]+)$", "\\1", title))) x1 = list(data = as.data.frame(ex.norm[, data.pdata.x1$geo_accession]), time = data.pdata.x1$time, gname = rownames(ex.norm)) x2 = list(data = as.data.frame(ex.norm[, data.pdata.x2$geo_accession]), time = data.pdata.x2$time, gname = rownames(ex.norm))
In this section we run all the analyses using the default arguments, for example, the period is 24 h, the amplitude cutoff for rhythmic gene is 0, etc. Details about changing the arguments can be found in the function documents by "?DCP_Rhythmicity".
library(DiffCircaPipeline) DCP_rhythm = DCP_Rhythmicity(x1, x2)
Parameter estimates for each group are in "DCP_rhythm\$x1\$rhythm"
head(DCP_rhythm$x1$rhythm) head(DCP_rhythm$x2$rhythm )
Types of rhythmicity for the two groups are in "DCP_rhythm\$rhythm.joint"
head(DCP_rhythm$rhythm.joint)
Summarize number of genes identified for each TOJR:
table(DCP_rhythm$rhythm.joint$TOJR)
Summarize number of genes identified for each TOJR after genome-wide FDR correction:
table(DCP_rhythm$rhythm.joint$TOJR.FDR)
DCP_dR2 = DCP_DiffR2(DCP_rhythm) head(DCP_dR2[order(DCP_dR2$p.R2), ])
By default, the genes that are rhyI/rhyII/both in "DCP_rhythm\$rhythm.joint\$TOJR" will be used. If users want to use other TOJR results, e.g., "DCP_rhythm\$rhythm.joint\$TOJR.FDR", please refer to next section.
DCP_dparam = DCP_DiffPar(DCP_rhythm, Par = "A&phase") head(DCP_dparam[order(DCP_dparam$p.overall), 1:7])
The rest of the columns contains parameter estimates from each group and the p-value for parameter comparision without multiple testing adjustment.
By default the function plots the genes that are most rhythmic in group I (r gI
).
p.scatter = DCP_ScatterPlot(DCP_rhythm, Info1 = gI, Info2 = gII, filename = NULL) # #if given file name the function will automatically save the plots to pdf files in the given directory. DCP_PlotDisplay(p.scatter) #DCP_ScatterPlot only stores the plots in a list, DCP_PlotDisplay will display plots side-by-side.
Plot the top 2 genes with $Delta R^2$
g.topR2 = DCP_dR2$gname[order(DCP_dR2$p.R2)][1:2] p.scatter = DCP_ScatterPlot(DCP_rhythm, g.topR2) DCP_PlotDisplay(p.scatter)
Plot the top 2 genes with $Delta A$
g.top.dA= DCP_dparam$gname[order(DCP_dparam$p.delta.A)][1:2] p.scatter = DCP_ScatterPlot(DCP_rhythm, g.top.dA) DCP_PlotDisplay(p.scatter)
Plot the top 2 genes with $Delta \phi$. Since both groups are control samples, the phase difference is not large.
g.top.dphase= DCP_dparam$gname[order(DCP_dparam$p.delta.peak)][1:2] p.scatter = DCP_ScatterPlot(DCP_rhythm, g.top.dphase) DCP_PlotDisplay(p.scatter)
The default is plot the top 100 rhythmic genes in group I. Users are able to plot selected genes by specifying the genes.plot argument.
DCP_PlotHeatmap(DCP_rhythm)
DCP_PlotPeakRadar(DCP_rhythm)
The peak histogram is essentially the same as the peak radar plot with only the presentation form changed.
DCP_PlotPeakHist(DCP_rhythm)
The default peak difference plot:
DCP_PlotPeakDiff(DCP_rhythm, Info1 = gI, Info2 = gII)
If users have performed differential rhythm parameter test that contains differential phase, the corresponding result can be displayed. The "dPhase" should be the result from differential rhythm parameter, and "color.cut" specifies the color regime according to the "dPhase".
DCP_PlotPeakDiff(DCP_rhythm, dPhase = DCP_dparam, color.cut = list(param = "p.delta.peak", fun = "<", val = 0.05, color.sig = "#b33515", color.none = "dark grey"), Info1 = gI, Info2 = gII)
The functions DCP_dR2 and DCP_DiffPar are only performed to a subset of genes for biologically meaningful tests (see the manuscript for more details). By default, the functions use the gene categories in "DCP_rhythm\$rhythm.joint\$TOJR". However, user are able to specify a different (types of joint rhythmicity) TOJR assignment. For example, we can use the genome-wide FDR corrected TOJR.
DCP_dR2 = DCP_DiffR2(DCP_rhythm, TOJR = DCP_rhythm$rhythm.joint$TOJR.FDR) DCP_dparam = DCP_DiffPar(DCP_rhythm, Par = "A&phase", TOJR = DCP_rhythm$rhythm.joint$TOJR.FDR)
Separate TOJR result can be generated with the toTOJR functoin. For example, we can perform an intuitive (but statistically criticized) venn diagram.
new.TOJR = toTOJR(DCP_rhythm, "VDA", alpha = 0.05, adjustP = TRUE) #adjustP = TRUE gives genome-wide FDR adjusted results DCP_dR2 = DCP_DiffR2(DCP_rhythm, TOJR = new.TOJR) DCP_dparam = DCP_DiffPar(DCP_rhythm, Par = "A&phase", TOJR = new.TOJR)
If the study involves human subjects, usually researchers do not have control of when and where the data will be collected. The function DCP_getZT converts a recorded clock time to Zeitgeber time according to the location and date of the collection.
t = as.POSIXlt("2022-09-17 13:43:00 EST", tz="America/New_York",usetz=TRUE) lat = 40.4406; long = -79.9959 #This is the Pittsburgh coordinates t.correct = DCP_getZT(t, lat, long, ZT.min = -6) print(t.correct)
Summarize the DR result to a table:
SummarizeDR(list(DCP_dR2$p.R2, DCP_dR2$q.R2), test = c("DRF", "DRF"), type = c("p-value", "q-value"), val = c(0.05, 0.05), out = "long") SummarizeDR(list(DCP_dR2$p.R2, DCP_dR2$q.R2), test = c("DRF", "DRF"), type = c("p-value", "q-value"), val = c(0.05, 0.05), out = "wide")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.