# IMPORTANT: this vignette can not be created if HiTC is not installed if (!require("HiTC", quietly = TRUE)) { knitr::opts_chunk$set(eval = FALSE) }
Hi-C is a sequencing-based molecular assay designed to measure intra and inter-chromosomal interactions between the DNA molecule. In particular, the identification of Topologically-Associated Domains (TADs), that is, of regions of the genome in which physical interactions are frequent, provides insight into the three-dimensional organization of a genome [2].
Hi-C data are in the form of two-dimensional contact maps, i.e., matrices
whose $i,j$ entry quantifies the intensity of the physical interaction between
two genome regions $i$ and $j$ at the DNA level. In this vignette, we
demonstrate the use of adjclust::hicClust
to perform adjacency-constrained
hierarchical agglomerative clustering (HAC) of Hi-C contact maps. The output of
this function is a dendrogram, which can be cut to identify TADs. The algorithm
used for adjacency-constrained (HAC) is described in [3,4].
library("adjclust")
The data set hic_imr90_40_XX
is an object of class HTCexp
which has been
obtained from the HiTC
package [4]. It is a contact map corresponding to the
first 500 x 500 bins on chromosome X vs chromosome X.
load(system.file("extdata", "hic_imr90_40_XX.rda", package = "adjclust"))
The script used to create this map can be found by executing the following command:
system.file("system/create_hic_chrXchrX.R", package="adjclust")
Now we have a look at the data.
HiTC::mapC(hic_imr90_40_XX)
hicClust
hicClust
operates directly on objects of class HTCexp
fit <- hicClust(hic_imr90_40_XX)
It is also possible to work on binned data. Below we choose a bin size of $5 \times 10^5$:
binned <- HiTC::binningC(hic_imr90_40_XX, binsize = 1e5) fitB <- hicClust(binned) fitB
HiTC::mapC(binned)
The output is of class chac
. In particular, it can be plotted as a dendrogram
silently relying on the function plot.dendrogram
:
plot(fitB, mode = "corrected")
Moreover, the output contains an element named merge
which describes the
successive merges of the clustering, and an element gains
which gives the
improvement in the criterion optimized by the clustering at each successive
merge.
head(cbind(fitB$merge, fitB$gains))
Contacts maps can also be stored as objects of class Matrix::dsCMatrix
, or as
plain text files. These types of input are also accepted as first argument to
hicClust
.
[1] Ambroise C., Dehman A., Neuvial P., Rigaill G., and Vialaneix N. (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Algorithms for Molecular Biology, 14, 22.
[2] Dixon J.R., et al (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485(7398), 376.
[3] Randriamihamison N., Vialaneix N., and Neuvial P. (2021). Applicability and interpretability of Ward's hierarchical agglomerative clustering with or without contiguity constraints. Journal of Classification, 38, 363–389.
[4] Servant N., et al (2012). HiTC: Exploration of High-Throughput 'C' experiments. Bioinformatics, 28(21), 2843-2844.
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.