View source: R/LSD_functions.R
Brick_local_score_differentiator | R Documentation |
Local_score_differentiator
calls topologically associated domains on Hi-C
matrices. Local score differentiator at the most fundamental level is a
change point detector, which detects change points in the directionality
index using various thresholds defined on a local directionality index
distributions.
The directionality index (DI) is calculated as defined by Dixon et al., 2012
Nature. Next, the difference of DI is calculated between neighbouring bins to
get the change in DI distribution in each bin. When a DI value goes from a
highly negative value to a highly positive one as expected to occur at domain
boundaries, the ensuing DI difference distribution becomes a very flat
distribution interjected by very large peaks signifying regions where such
a change may take place. We use two difference vectors, one is the difference
vector between a bin and its adjacent downstream bin and another is the
difference between a bin and its adjacent upstream bin. Using these vectors,
and the original directionality index, we define domain borders as outliers.
Brick_local_score_differentiator( Brick, chrs = NULL, resolution = NA, all_resolutions = FALSE, min_sum = -1, di_window = 200L, lookup_window = 200L, tukeys_constant = 1.5, strict = TRUE, fill_gaps = TRUE, ignore_sparse = TRUE, sparsity_threshold = 0.8, remove_empty = NULL, chunk_size = 500, force_retrieve = TRUE )
Brick |
Required. A string specifying the path to the Brick store created with Create_many_Brick. |
chrs |
Optional. Default NULL If present, only TAD calls for elements in chrs will be done. |
resolution |
Optional. Default NA When an object of class BrickContainer is provided, resolution defines the resolution on which the function is executed |
all_resolutions |
Optional. Default FALSE If resolution is not defined and all_resolutions is TRUE, the resolution parameter will be ignored and the function is executed on all files listed in the Brick container |
min_sum |
Optional. Default -1 Process bins in the matrix with row.sums greater than min_sum. |
di_window |
Optional. Default 200 Use di_window to define the directionality index. |
lookup_window |
Optional. Default 200 Use lookup_window local window to call borders. At smaller di_window values we recommend setting this to 2*di_window |
tukeys_constant |
Optional. Default 1.5 tukeys_constant*IQR (inter-quartile range) defines the lower and upper fence values. |
strict |
Optional. Default TRUE If TRUE, strict creates an additional filter on the directionality index requiring it to be either greater than or less than 0 on the right tail or left tail respectively. |
fill_gaps |
Optional. Default TRUE If TRUE, this will affect the TAD stiching process. All Border starts are stiched to the next downstream border ends. Therefore, at times border ends remain unassociated to a border start. These border ends are stiched to the adjacent downstream bin from their upstream border end when fill_gaps is true. TADs inferred in this way will be annotated with two metadata columns in the GRanges object. gap.fill will hold a value of 1 and level will hold a value 1. TADs which were not filled in will hold a gap.fill value of 0 and a level value of 2. |
ignore_sparse |
Optional. Default TRUE If TRUE, a matrix which has been defined as sparse during the matrix loading process will be treated as a dense matrix. The sparsity_threshold filter will not be applied. Please note, that if a matrix is defined as sparse and fill_gaps is TRUE, fill_gaps will be turned off. |
sparsity_threshold |
Optional. Default 0.8 Sparsity threshold relates to the sparsity index, which is computed as the number of non-zero bins at a certain distance from the diagonal. If a matrix is sparse and ignore_sparse is FALSE, bins which have a sparsity index value below this threshold will be discarded from DI computation. |
remove_empty |
Not implemented. After implementation, this will ensure that the presence of centromeric regions is accounted for. |
chunk_size |
Optional. Default 500 The size of the matrix chunk to process. This value should be larger than 2x di_window. |
force_retrieve |
Optional. Default TRUE If TRUE, this will force the retrieval of a matrix chunk even when the retrieval includes interaction points which were not loaded into a Brick store (larger chunks). Please note, that this does not mean that DI can be computed at distances larger than max distance. Rather, this is meant to aid faster computation. |
To define an outlier, fences are first defined. The fences are defined using tukeys_constant x inter-quartile range of the directionality index. The upper fence used for detecting domain starts is the 75th quartile + (IQR x tukeys_constant), while the lower fence is the 25th quartile - (IQR x tukeys_constant). For domain starts the DI difference must be greater than or equal to the upper fence, it must be greater than the DI and the DI must be a finite real value. If strict is TRUE, DI will also be required to be greater than 0. Similarly, for domain ends the DI difference must be lower than or equal to the lower fence, it must be lower than the DI and the DI must be a finite real value. If strict is TRUE, DI will also be required to be lower than 0.
After defining outliers, each domain start will be associated to its nearest downstream domain end. If fill_gaps is defined as TRUE and there are domain ends which remain unassociated to a domain start, These domain ends will be associated to the bin adjacent to their nearest upstream domain end. This associations will be marked by metadata columns, gap.fill= 1 and level = 1.
This function provides the capability to call very accurante TAD definitions in a very fast way.
A ranges object containing domain definitions. The starts and ends of the ranges coincide with the starts and ends of their contained bins from the bintable.
Bintable.path <- system.file(file.path("extdata", "Bintable_100kb.bins"), package = "HiCBricks") out_dir <- file.path(tempdir(), "lsd_test") dir.create(out_dir) My_BrickContainer <- Create_many_Bricks(BinTable = Bintable.path, bin_delim = " ", output_directory = out_dir, file_prefix = "Test", experiment_name = "Vignette Test", resolution = 100000, remove_existing = TRUE) Matrix_file <- system.file(file.path("extdata", "Sexton2012_yaffetanay_CisTrans_100000_corrected_chr3R.txt.gz"), package = "HiCBricks") Brick_load_matrix(Brick = My_BrickContainer, chr1 = "chr3R", chr2 = "chr3R", matrix_file = Matrix_file, delim = " ", remove_prior = TRUE, resolution = 100000) TAD_ranges <- Brick_local_score_differentiator(Brick = My_BrickContainer, chrs = "chr3R", resolution = 100000, di_window = 10, lookup_window = 30, strict = TRUE, fill_gaps = TRUE, chunk_size = 500)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.