generate_cicero_models | R Documentation |
Function to generate graphical lasso models on all sites in a CDS object within overlapping genomic windows.
generate_cicero_models( cds, distance_parameter, s = 0.75, window = 5e+05, max_elements = 200, genomic_coords = cicero::human.hg19.genome )
cds |
A cicero CDS object generated using |
distance_parameter |
Distance based penalty parameter value. Generally,
the mean of the calculated |
s |
Power law value. See details. |
window |
Size of the genomic window to query, in base pairs. |
max_elements |
Maximum number of elements per window allowed. Prevents very large models from slowing performance. |
genomic_coords |
Either a data frame or a path (character) to a file
with chromosome lengths. The file should have two columns, the first is
the chromosome name (ex. "chr1") and the second is the chromosome length
in base pairs. See |
The purpose of this function is to compute the raw covariances
between each pair of sites within overlapping windows of the genome.
Within each window, the function then estimates a regularized correlation
matrix using the graphical LASSO (Friedman et al., 2008), penalizing pairs
of distant sites more than proximal sites. The scaling parameter,
distance_parameter
, in combination with the power law value s
determines the distance-based penalty.
The parameter s
is a constant that captures the power-law
distribution of contact frequencies between different locations in the
genome as a function of their linear distance. For a complete discussion
of the various polymer models of DNA packed into the nucleus and of
justifiable values for s, we refer readers to (Dekker et al., 2013) for a
discussion of justifiable values for s. We use a value of 0.75 by default
in Cicero, which corresponds to the “tension globule” polymer model of DNA
(Sanborn et al., 2015). This parameter must be the same as the s parameter
for estimate_distance_parameter
.
Further details are available in the publication that accompanies this
package. Run citation("cicero")
for publication details.
A list of results for each window. Either a glasso
object, or
a character description of why the window was skipped. This list can be
directly input into assemble_connections
to create a
reconciled list of cicero co-accessibility scores.
Dekker, J., Marti-Renom, M.A., and Mirny, L.A. (2013). Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403.
Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.
Sanborn, A.L., Rao, S.S.P., Huang, S.-C., Durand, N.C., Huntley, M.H., Jewett, A.I., Bochkov, I.D., Chinnappan, D., Cutkosky, A., Li, J., et al. (2015). Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. U. S. A. 112, E6456–E6465.
estimate_distance_parameter
data("cicero_data") data("human.hg19.genome") sample_genome <- subset(human.hg19.genome, V1 == "chr18") sample_genome$V2[1] <- 100000 input_cds <- make_atac_cds(cicero_data, binarize = TRUE) input_cds <- reduceDimension(input_cds, max_components = 2, num_dim=6, reduction_method = 'tSNE', norm_method = "none") tsne_coords <- t(reducedDimA(input_cds)) row.names(tsne_coords) <- row.names(pData(input_cds)) cicero_cds <- make_cicero_cds(input_cds, reduced_coordinates = tsne_coords) model_output <- generate_cicero_models(cicero_cds, distance_parameter = 0.3, genomic_coords = sample_genome)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.