Install the most recent stable version from Bioconductor:
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("TeMPO")
And load TeMPO
:
library(TeMPO)
Alternatively, you can install the development version directly from GitHub using devtools
:
devtools::install_github("MalteThodberg/TeMPO")
For general questions about the usage of TeMPO, use the official Bioconductor support forum and tag your question "TeMPO". We strive to answer questions as quickly as possible.
For technical questions, bug reports and suggestions for new features, we refer to the TeMPO github page
TeMPO can calculate the summarized signal (e.g. the average counts-per-millon) across a set of genomic regions (e.g. Transcription Start Sites (TSSs) and enhancers). TeMPO needs the following input:
TeMPO will output the result in "tidy" format as a tibble, making it easy to further analyze or plot the data using r BiocStyle::CRANpkg("ggplot2")
and the r BiocStyle::CRANpkg("tidyverse")
packages.
TeMPO example below use example data for hela cells (hg19) chromosomes 20-22 from the following sources:
r BiocStyle::Biocpkg("CAGEfightR")
package.r BiocStyle::Biocpkg("AnnotationHub")
package with IDs: AH32877, AH32879, AH32881, AH32884).First, we load the needed packages:
library(AnnotationHub) library(tidyverse) # Use light ggplot2 theme theme_set(theme_light())
CAGE_clusters
contain the location of TSSs and enhancer found using the CAGEfightR package. TSSs and enhancers are expected to show different epigenetic modifications, reflecting their different function:
data("CAGE_clusters") CAGE_clusters
CAGE_plus
and CAGE_minus
contains the pooled CAGE signal across wildtype cells (WT) and exosome knock-out cell (KD). The exosome knock-down is expected to stabilize eRNA transcripts produced from enhancers that are normally quickly degraded. As the CAGE data is very sparse, it can be loaded directly into memory as an RleListList
:
data("CAGE_plus") data("CAGE_minus") CAGE_plus
We can obtain epigenetic data for comparison to the CAGE data using the The Roadmap Epigenomics Project via the r BiocStyle::Biocpkg("AnnotationHub")
package. As ChIP-Seq and DNase-signal is not sparse, we will store these signals as BigWigFile-objects rather than load them into memory:
NOTE: The first time you run this, it will take several minutes for AnnotationHub to download the file. After that, AnnotationHub will simply fetch the file from cache, as is done here.
# Setup AnnotationHub ah <- AnnotationHub() # Retrive data and save as BigWigFileList ChIP_Seq <- list(DNase="AH32877", H3K4me1="AH32879", H3K4me3="AH32881", H3K27ac="AH32884") %>% lapply(function(x) ah[[x]]) %>% as("List")
Below are examples of generating metaprofiles for different combinations of input.
A simple example of (unstranded) DNAse-signal around promoters:
promoters_only <- subset(CAGE_clusters, txType == "promoter") SS1 <- tidyMetaProfile(sites = promoters_only, forward=ChIP_Seq$DNase, reverse=NULL, upstream=300, downstream=300) head(SS1) ggplot(SS1, aes(x=pos0, y=sense)) + geom_line(alpha=0.75) + geom_vline(xintercept=0, alpha=0.75, linetype="dotted") + labs(x="Basepair position relative to center", y="Average DNase signal")
Classic example of bidirectional transcription of eRNAs at enhancers shown using a stranded meta-profile:
enhancers_only <- subset(CAGE_clusters, clusterType == "enhancer") SS2 <- tidyMetaProfile(sites = enhancers_only, forward=CAGE_plus$WT, reverse=CAGE_minus$WT, upstream=250, downstream=250) head(SS2) SS2 %>% gather(key="direction", value="score", sense, anti, factor_key=TRUE) %>% ggplot(aes(x=pos0, y=score, color=direction)) + scale_color_brewer("Direction", palette="Set1") + geom_line(alpha=0.75) + geom_vline(xintercept=0, alpha=0.75, linetype="dotted") + labs(x="Basepair position relative to center", y="Average CAGE signal")
Multiple different epigentic signals across a single set of sites:
SM <- tidyMetaProfile(sites = promoters_only, forward=ChIP_Seq, reverse=NULL, upstream=300, downstream=300) head(SM) ggplot(SM, aes(x=pos0, y=sense, color=signal)) + geom_line(alpha=0.75) + geom_vline(xintercept=0, alpha=0.75, linetype="dotted") + labs(x="Basepair position relative to center", y="Average CAGE signal")
H3K27ac at CAGE-defined TSSs at different positions in genes:
by_txType <- CAGE_clusters %>% subset(clusterType == "TSS" & txType %in% c("promoter", "fiveUTR", "proximal")) %>% splitAsList(.$txType, drop=TRUE) MS <- tidyMetaProfile(sites = by_txType, forward=ChIP_Seq$DNase, reverse=NULL, upstream=300, downstream=300) head(MS) ggplot(MS, aes(x=pos0, y=sense, color=sites)) + geom_line(alpha=0.75) + geom_vline(xintercept = 0, linetype="dotted", alpha=0.75) + scale_color_brewer("Genic Context", palette="Set2") + labs(x="Relative position from center", y="Average H3K27ac Signal")
Meta-profile showing the difference in epigentic modifications at TSSs vs enhancers:
by_clusterType <- split(CAGE_clusters, CAGE_clusters$clusterType) MM1 <- tidyMetaProfile(sites = by_clusterType, forward=ChIP_Seq, reverse=NULL, upstream=300, downstream=300) head(MM1) ggplot(MM1, aes(x=pos0, y=sense, color=sites)) + geom_line(alpha=0.75) + facet_grid(signal~., scales="free_y") + labs(x="Relative position from center", y="Average Signal")
Meta-profile showing the effect of the exosome knockdown on eRNAs detected by CAGE:
MM2 <- tidyMetaProfile(sites = by_clusterType, forward=CAGE_plus, reverse=CAGE_minus, upstream=300, downstream=300) head(MM2) MM2 %>% gather(key="direction", value="score", sense, anti, factor_key=TRUE) %>% ggplot(aes(x=pos0, y=score, color=direction)) + geom_line(alpha=0.75) + facet_grid(sites~signal, scales="free_y") + scale_color_brewer("Direction", palette="Set1") + labs(x="Relative position from center", y="Average Signal")
In case multiple genomic signals are provided (Multiple BigWigFile
-objects in a BigWigFileList
or multiple RleList
-objects in an RleListList
), signals can be analyzed in parallel using the r BiocStyle::Biocpkg("BiocParallel")
package. TidyMetaProfile uses the default registered backend:
library(BiocParallel) # Set the backend to run calculations in series register(SerialParam()) # Set the backend to run parallelize calculations using i.e. 3 cores: register(MulticoreParam(workers=3))
Instead of calculating the mean across sites, alternative summary functions can be provide. For example, instead of the average meta-profile, we can plot the median meta-profile:
# Recalculate the first example using medians SS1_median <- tidyMetaProfile(sites = promoters_only, forward=ChIP_Seq$DNase, reverse=NULL, upstream=300, downstream=300, sumFun = matrixStats::colMedians) # Merge the two profiles and plot list(mean=SS1, median=SS1_median) %>% bind_rows(.id="summary") %>% ggplot(aes(x=pos0, y=sense, color=summary)) + geom_line(alpha=0.75) + geom_vline(xintercept=0, alpha=0.75, linetype="dotted") + scale_color_discrete("Summary-function") + labs(x="Basepair position relative to center", y="Average DNase signal")
In many cases, a few sites may have very extreme values which can disproportionally skew the calculated average profiles. tidyMetaProfile can automatically trim the sites with lowest and/or highest signals based on quantiles:
# Recalculate the first example with different quantile trimmings: SS1_95percent <- tidyMetaProfile(sites = promoters_only, forward=ChIP_Seq$DNase, reverse=NULL, upstream=300, downstream=300, trimUpper=0.95) SS1_90percent <- tidyMetaProfile(sites = promoters_only, forward=ChIP_Seq$DNase, reverse=NULL, upstream=300, downstream=300, trimUpper=0.90) # Merge the three profiles and plot list(`100%`=SS1, `95%`=SS1_95percent, `90%`=SS1_90percent) %>% bind_rows(.id="summary") %>% ggplot(aes(x=pos0, y=sense, color=summary)) + geom_line(alpha=0.75) + geom_vline(xintercept=0, alpha=0.75, linetype="dotted") + scale_color_discrete("Trimming-level") + labs(x="Basepair position relative to center", y="Average DNase signal")
The low-level functions used by tidyMetaProfile are all exposed for advanced user:
agnosticImport
: Import of chunks from BigWigFile or RleListwideMetaProfile
: Output results as a site-by-position matrix.quantileTrim
: Trim outliers from a single matrix or a pair of matrices.The man pages for those functions contains details on how to use them.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.