knitr::opts_chunk$set(echo = TRUE, fig.width = 7, fig.height = 5)
The GenomicWidgets package enables the creation of interactive visualizations for functional genomics data in R. These interactive visualizations can be included in stand-alone Rmarkdown HTML documents, or can be linked in shiny applications. These visualization are intended to highlight the data at different scales, from a single locus to a few loci to hundreds of regions along the genome, and to enable the views at different scales to be linked together.
In addition to GenomicWidgets
, we will load a few other packages for use in this vignette.
library(GenomicWidgets) library(iheatmapr) library(GenomicRanges) library(SummarizedExperiment) # If TxDb.Hsapiens.UCSC.hg19.knownGene is not installed, you can install it via # source("https://bioconductor.org/biocLite.R") # biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") library(TxDb.Hsapiens.UCSC.hg19.knownGene)
We will also use data from the genomationData
package. If not installed, can be installed via:
source("https://bioconductor.org/biocLite.R") biocLite("genomationData")
The GenomicWidgets package builds on the iheatmapr
package to enable creation of complex, interactive heatmaps for genomics data.
GenomicWidgets
includes an iheatmap
method for SummarizedExperiment objects, which are often used for storing expression data. Compared to the iheatmap
method for a matrix, the method for SummarizedExperiment has some different default settings that are often appropriate for expression or epigenomics data.
The package includes a small subset of the expression data from the Roadmap Epigenomics project.
data("rpkm_chr21")
We'll subset further to use just the first 50 genes, and only samples from Primary Culture or Tissue.
rpkm_chr21 <- rpkm_chr21[1:50,which(colData(rpkm_chr21)$TYPE %in% c("PrimaryCulture", "PrimaryTissue"))]
If we call the iheatmap
method on a SummarizedExperiment object, we obtain the following heatmap:
iheatmap(rpkm_chr21,"rpkm")
If instead we had applied the iheatmap
method applied to the plain matrix of RPKM values, the basic iheatmap method is used with defaults that are not as suitable, leading to a less interesting heatmap:
iheatmap(assays(rpkm_chr21)[["rpkm"]])
For SummarizedExperiment, hierarchical clustering and standardization of rows is the default.
We can further enhance this heatmap by passing along additional components of the SummarizedExperiment object to additional arguments to the iheatmap
method.
iheatmap(rpkm_chr21, "rpkm", x = colData(rpkm_chr21)$STD_NAME, y = rowData(rpkm_chr21)$SYMBOL, col_annotation = colData(rpkm_chr21)[,c("TYPE","SEX")])
There is also an add_iheatmap
method defined for SummarizedExperiment for adding a heatmap based on a SummarizedExperiment to an existing heatmap. Here we will separate our samples into two groups to demonstrate the add_iheatmap
function:
rpkm_group1 <- rpkm_chr21[,which(colData(rpkm_chr21)$TYPE == "PrimaryTissue")] rpkm_group2 <- rpkm_chr21[,which(colData(rpkm_chr21)$TYPE == "PrimaryCulture")] iheatmap(rpkm_group1, "rpkm", x = colData(rpkm_group1)$STD_NAME, y = rowData(rpkm_group1)$SYMBOL, col_title = "Primary Tissue") %>% add_iheatmap(rpkm_group2, "rpkm", x = colData(rpkm_group2)$STD_NAME, col_title = "Primary Culture")
Note that in practice this might not make sense to do, as the rows are now scaled separately. The add_iheatmap
function can be useful for juxtaposing a different data type. If a name for the colorbar is not given (via the name
argument), the two main heatmaps are put on the same color scale.
GenomicWidgets
also has methods for making coverage heatmaps.
For demonstration of coverage heatmaps, we'll use ChIP-seq data for the H1 ESC cell line from the ENCODE project included in the genomationData
Bioconductor package. We'll first read in a sample table and get the correct paths to the files.
genomation_dir <- system.file("extdata", package = "genomationData") samp.file <- file.path(genomation_dir,'SamplesInfo.txt') samp.info <- read.table(samp.file, header=TRUE, sep='\t', stringsAsFactors = FALSE) samp.info$fileName <- file.path(genomation_dir, samp.info$fileName)
Then we'll read in the location of CTCF peaks. We'll plot the coverage at these peaks.
ctcf.peaks <- genomation::readBroadPeak(system.file("extdata", "wgEncodeBroadHistoneH1hescCtcfStdPk.broadPeak.gz", package = "genomationData")) ctcf.peaks <- ctcf.peaks[seqnames(ctcf.peaks) == "chr21"] ctcf.peaks <- ctcf.peaks[order(-ctcf.peaks$signalValue)] ctcf.peaks <- resize(ctcf.peaks, width = 501, fix = "center")
We'll limit ourselves to considering 50 of the peaks.
ctcf.peaks <- ctcf.peaks[1:50]
We first will make coverage matrices. These will be returned as RangedSummarizedExperiment objects.
ctcf_mats <- make_coverage_matrix(samp.info$fileName[1:5], ctcf.peaks, input_names = samp.info$sampleName[1:5], up = 250, down = 250, binsize = 25)
With bam input, the make_coverage_matrix function computes the coverage based on the basepairs covered by the actual reads; no extending or shifting is done. With bigwig input, the function assumes the bigwig file contains the coverage track.
You can also use your own preferred method for making a coverage matrix. The coverage_heatmap
method shown below can accept either a SummarizedExperiment, a plain matrix, a list of matrices, or a ScoreMatrix or ScoreMatrixList object (objects from the genomation package). The genomation
package includes the ScoreMatrix
, ScoreMatrixBin
, and ScoreMatrixList
functions to read in coverage and make coverage matrices.
Here we will create a single coverage heatmap for the CTCF ChIP-seq.
coverage_heatmap(ctcf_mats, "Ctcf")
By default, the coverage matrix shown in the heatmap is scaled by the root mean squared value of the row. This can be adjusted by modifying the scale_method argument. For example, to scale based off the 95th percentile value in the matrix:
coverage_heatmap(ctcf_mats, "Ctcf", scale_method = "PercentileMax")
You can obained the scaled coverage matrices by using the normalize_coverage_matrix
function.
normed_ctcf_mats <- normalize_coverage_matrix(ctcf_mats, method = "PercentileMax")
coverage_heatmap(ctcf_mats, c("Ctcf","Znf143"))
You can also add multiple coverage heatmaps at once to an existing heatmap using add_coverage_heatmaps
. We will demonstrate that in the next section.
Get coverage matrix at tss for the Chip-Seq
tss <- promoters(rowRanges(rpkm_chr21), up = 1, down = 1) tss_mats <- make_coverage_matrix(samp.info$fileName[1:5], tss, input_names = samp.info$sampleName[1:5], up = 500, down = 500, binsize = 25)
Make expression heatmap and then add coverage heatmap for TSS for two of the factors:
iheatmap(rpkm_chr21, "rpkm", x = colData(rpkm_chr21)$STD_NAME, y = rowData(rpkm_chr21)$SYMBOL, col_annotation = colData(rpkm_chr21)[,c("TYPE","SEX")]) %>% add_coverage_heatmap(tss_mats, c("P300","Suz12"))
GenomicWidgets
also has functions for interactive local views of genomic signals.
For plotting genomic signals as tracks, first you specify the source of the data and other options.
track_params <- set_track_parameters(samp.info$fileName[1:3], annotation = TxDb.Hsapiens.UCSC.hg19.knownGene, track_names = samp.info$sampleName[1:3], share_y = TRUE)
Those parameters can then be passed along with a range to generate the interactive genome track plot.
plot_tracks(resize(ctcf.peaks[2], width = 5000, fix = "center"), track_params)
You can easily plot multiple regions by passing along several ranges rather than a single range.
plot_tracks(resize(ctcf.peaks[1:4], width = 5000, fix = "center"), track_params)
You can over-ride parameters set by set_track_parameters by passing them again to plot_tracks
plot_tracks(resize(ctcf.peaks[2], width = 5000, fix = "center"), track_params, showlegend = FALSE)
When plotting mutliple windows, by default the x axis reflects the relative position to the center of the window. This can be modified by setting the offset argument, which tells the function what position should be used as the center anchor. The xaxis title can also be modified via the xtitle argument.
plot_tracks(resize(ctcf.peaks[1:4], width = 5000, fix = "center"), track_params, offset = 0, xtitle = "CTCF peaks")
plot_tracks(resize(ctcf.peaks[2], width = 5000, fix = "center"), track_params, annotation_position = "top")
You can group multiple signal together on one set of axes by using the groups argument.
track_params_grouped <- set_track_parameters(samp.info$fileName[1:4], annotation = TxDb.Hsapiens.UCSC.hg19.knownGene, track_names = samp.info$sampleName[1:4], share_y = TRUE, groups = c("CTCF & cohesin","P300","Suz12", "CTCF & cohesin")) plot_tracks(resize(ctcf.peaks[2], width = 5000, fix = "center"), track_params_grouped)
Note that at the moment grouping is ignored when plotting multiple loci -- all tracks for the locus will be grouped together.
When plotting multiple regions, by default the name is taken from "name" column in the ranges (if it exists). You can alter that by passing something to the locus_names argument.
plot_tracks(resize(ctcf.peaks[1:2], width = 5000, fix = "center"), track_params, locus_names = c("Peak 1","Peak 2"))
You can change the colors of the tracks by passing a colors
argument.
plot_tracks(resize(ctcf.peaks[1], width = 5000, fix = "center"), track_params, colors = c("red","blue","black"))
You can alter plotly
layout parameters by passing a list to the layout argument. Use caution with this argument, as this could disrupt the intended layout if used incorrectly. Consult the plotly.js
documentation for layout options. Here we will demonstrate adjusting two of the margins:
plot_tracks(resize(ctcf.peaks[1], width = 5000, fix = "center"), track_params, layout = list(margin = list(r = 200, t = 20)))
We can add a boxplot next to each region summarizing the gene expression levels (or any other feature of the locus) for some group of samples. We do that by first specifying the parameters to generate such a summary based on a SummarizedExperiment object. We pass along to the group
argument the name of a column in the colData of the SummarizedExperiment object that we will use to group the expression values along the x axis. The ranges argument tells what ranges correspond to what rows of the SummarizedExperiment. By default, if the SummarizedExperiment is a RangedSummarizedExperiment then the rowRanges are used.
summary_params <- set_summary_parameters(rpkm_chr21, groups = "GROUP", ranges = resize(tss, width = 5000, fix = "center"))
We pass the summary parameters to our track parameters, via the summary argument.
track_plus_summary_params <- set_track_parameters(samp.info$fileName[1:3], annotation = TxDb.Hsapiens.UCSC.hg19.knownGene, track_names = samp.info$sampleName[1:3], share_y = TRUE, summary = summary_params)
Calling plot_tracks
with our new track parameters including the summary parameters creates a plot of the ChIP-seq signal around the TSS of those genes as well as a boxplot of the gene expression for different samples for that gene on the right. We can call it using GenomicRanges, but they must match ranges we gave to set_summary_parameters
.
plot_tracks(resize(tss, width = 5000, fix = "center")[1:3], track_plus_summary_params)
We can also instead call plot_tracks
with rownames of the SummarizedExperiment we passed to set_summary_parameters
.
plot_tracks(rownames(rpkm_chr21)[1:3], track_plus_summary_params)
The properties of the summary plots can be altered by adding a named list of new parameter values to the summary_args argument. For example, we can alter the color:
plot_tracks(resize(tss, width = 5000, fix = "center")[1:3], track_plus_summary_params, summary_args = list(colors = "black"))
Sometimes there can be issues with spacing of legend and margin. These can be addressed by passing margin and legend positions to layout argument:
plot_tracks(rownames(rpkm_chr21)[1:2], track_plus_summary_params, layout = list(margin = list(r = 200), legend = list(x = 1.2)))
We can link our heatmap views of many genes/regions with the more detailed track views by making a shiny app that links the two through a click event. We will need both our heatmap object, a track parameters object, and a link function that helps link the heatmap to the track view. The link function takes as input the heatmap as well as the potential input for plot_tracks
. That input should be ordered according to how the rows were ordered in the input data for the heatmap.
hm <- iheatmap(rpkm_chr21, "rpkm", x = colData(rpkm_chr21)$STD_NAME, y = rowData(rpkm_chr21)$SYMBOL, col_annotation = colData(rpkm_chr21)[,c("TYPE","SEX")]) %>% add_coverage_heatmap(tss_mats, c("P300","Suz12")) link_fn <- heatmap_click(hm, rownames(rpkm_chr21))
Note: The following code has to be run interactively!
heatmap_to_tracks_shiny(hm, track_plus_summary_params, link_fn)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.