options(width = 400)
TADBD
is a fast and sensitive tool for detection of TAD boundaries on Hi-C contact matrix. A Haar-based algorithm is proposed to detect TAD boundaries on Hi-C contact matrix. In view of the geometry of TAD, a diagonal template is chosen to extract Haar feature of each point on the diagonal of contact matrix, by considering multi-scale aggregation at template size. Then the peaks on the average Haar feature value curve are located, and statistical filtering is performed to determine those significant ones as the final TAD boundaries. Furthermore, the feature extraction procedure is accelerated with the help of a compact integrogram.
The package contains three functions, DataLoad()
, TADBD()
and Output()
with the function of loading data, detecting TAD boundaries and outputing the result. The input contact matrix can be dense or sparse, that is to say, the resulting files or memory objects of multiple contact matrix preparation tools and most Hi-C normalization approaches are acceptable. As output, the final detected TAD boundaries can be given in an optional form between text and graphics.
# if (!requireNamespace("BiocManager", quietly=TRUE)) # install.packages("BiocManager") # BiocManager::install("TADBD") #install.packages("devtools") # if you have not installed "devtools" package #devtools::install_github("bioinfo-lab/TADBD") #library(TADBD)
The input contact matrix can be dense or sparse.
Below is an example of loading data when 'hicmat' is in sparse format
#Load R package TADBD #library(TADBD) #Configuration of the parameters, including species, chromsome and resolution #species <- "hg19" #chr <- "chr18" #resolution <- 50000 #Close scientific notation #options(scipen = 999) #Specify Hi-C data to be loaded #data(hicdata) #Load a Hi-C contact matrix file in a sparse format #hicmat <- DataLoad(hicdata, bsparse = TRUE, species, chr, resolution)
Below is an example of loading data when 'hicmat' is in dense format
#Load R package TADBD library(TADBD) #Configuration of the parameters, including species, chromsome and resolution species <- "hg19" chr <- "chr18" resolution <- 50000 #Close scientific notation options(scipen = 999) #Specify Hi-C data to be loaded data(hicdata) #Load a Hi-C contact matrix file in a dense format hicmat <- DataLoad(hicdata, bsparse = FALSE, species, chr, resolution)
Once matrices that the function LoadData()
outputs are in an acceptable format, TADBD()
can be run with only one parameter. Below we show how to run the algorithm, and TADBD()
outputs the the bin number of TAD boundaries on the contact matrix
#Load R package TADBD library(TADBD) #Configuration of the parameters, including species, chromsome and resolution species <- "hg19" chr <- "chr18" resolution <- 50000 #Close scientific notation options(scipen = 999) #Specify Hi-C data to be loaded data(hicdata) #Load a Hi-C contact matrix file in a dense format hicmat <- DataLoad(hicdata, bsparse = FALSE, species, chr, resolution) #Detect TAD boundaries on the loaded contact matrix using TADBD method using default parameter configuration, that is template.sizes = c(4,5,6), bstatfilter = TRUE df_result <- TADBD(hicmat)
Our method is specifically designed to detect TAD boundaries. The function output()
takes the bin number of the detected TAD boundaries as input, and outputs the TAD boundaries in two optional forms where one is two text files for detected TAD boundaries and intermediate peaks respectively, and the other is the two text files and a graphical heatmap. Below is an example of running the function output()
.
#Load R package TADBD library(TADBD) #Configuration of the parameters, including species, chromsome and resolution species <- "hg19" chr <- "chr18" resolution <- 50000 #Close scientific notation options(scipen = 999) #Specify Hi-C data to be loaded data(hicdata) #Load a Hi-C contact matrix file in a dense format hicmat <- DataLoad(hicdata, bsparse = FALSE, species, chr, resolution) #Detect TAD boundaries on the loaded contact matrix using TADBD method using default parameter configuration, that is template.sizes = c(4,5,6), bstatfilter = TRUE df_result <- TADBD(hicmat) #Output two text files, one is for detected TAD boundaries, the other for intermediate peaks Output(df_result, species, chr, resolution, outxtfile="./result")
#Load R package TADBD library(TADBD) #Configuration of the parameters, including species, chromsome and resolution species <- "hg19" chr <- "chr18" resolution <- 50000 #Close scientific notation options(scipen = 999) #Specify Hi-C data to be loaded data(hicdata) #Load a Hi-C contact matrix file in a dense format hicmat <- DataLoad(hicdata, bsparse = FALSE, species, chr, resolution) #Detect TAD boundaries on the loaded contact matrix using TADBD method using default parameter configuration, that is template.sizes = c(4,5,6), bstatfilter = TRUE df_result <- TADBD(hicmat) #Output two text files and a heatmap with TAD boundary tracks, the parameters of heatmap include starting and ending coordinates, as well as the color and the width of tracks Output(df_result, species, chr, resolution, outxtfile="./result", bheatmap = TRUE, heatmapfile="./heatmap", hicmat, map_start=0, map_end=10000000, l_color="blue", l_width=2.5)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.