compute_corr: Compute correlation matrix
In neurogenomics/EpiCompare: Comparison, Benchmarking & QC of Epigenomic Datasets

compute_corr

R Documentation

Compute correlation matrix

Description

Compute correlation matrix on all peak files.

Usage

compute_corr(
  peakfiles,
  reference = NULL,
  genome_build,
  keep_chr = NULL,
  drop_empty_chr = FALSE,
  bin_size = 5000,
  method = "spearman",
  intensity_cols = c("total_signal", "qValue", "Peak Score", "score"),
  return_bins = FALSE,
  fill_diag = NA,
  workers = check_workers(),
  save_path = tempfile(fileext = ".corr.csv.gz")
)

Arguments

`peakfiles`	A list of peak files as GRanges object and/or as paths to BED files. If paths are provided, EpiCompare imports the file as GRanges object. EpiCompare also accepts a list containing a mix of GRanges objects and paths.Files must be listed and named using `list()`. E.g. `list("name1"=file1, "name2"=file2)`. If no names are specified, default file names will be assigned.
`reference`	A named list containing reference peak file(s) as GRanges object. Please ensure that the reference file is listed and named i.e. `list("reference_name" = reference_peak)`. If more than one reference is specified, individual reports for each reference will be generated. However, please note that specifying more than one reference can take awhile. If a reference is specified, it enables two analyses: (1) plot showing statistical significance of overlapping/non-overlapping peaks; and (2) ChromHMM of overlapping/non-overlapping peaks.
`genome_build`	The build of all peak and reference files to calculate the correlation matrix on. If all peak and reference files are not of the same build use liftover_grlist to convert them all before running. Genome build should be one of hg19, hg38, mm9, mm10.
`keep_chr`	Which chromosomes to keep.
`drop_empty_chr`	Drop chromosomes that are not present in any of the `peakfiles` (default: `FALSE`).
`bin_size`	Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage.
`method`	Default spearman (i.e. non-parametric). A character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson", "kendall", or "spearman": can be abbreviated.
`intensity_cols`	Depending on which columns are present, this value will be used to get quantiles and ultimately calculate the correlations: "total_signal" : Used by the peak calling software SEACR. NOTE: Another SEACR column (e.g. "max_signal") can be used together or instead of "total_signal". "qValue"Used by the peak calling software MACS2/3. Should contain the negative log of the p-values after multiple testing correction. "Peak Score" : Used by the peak calling software HOMER.
`return_bins`	If `TRUE`, returns a named list with both the rebinned (standardised) peaks ("bin") and the correlation matrix ("cor"). If `FALSE` (default), returns only the correlation matrix (unlisted).
`fill_diag`	Fill the diagonal of the overlap matrix.
`workers`	Number of threads to parallelize across.
`save_path`	Path to save a table of correlation results to.

Value

correlation matrix

Examples

data("CnR_H3K27ac")
data("CnT_H3K27ac")
data("encode_H3K27ac")
peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac)
reference <- list("encode_H3K27ac"=encode_H3K27ac)

#increasing bin_size for speed but lower values will give more granular corr
corr_mat <- compute_corr(peakfiles = peakfiles,
                         reference = reference,
                         genome_build = "hg19",
                         bin_size = 200000, 
                         workers = 1)

neurogenomics/EpiCompare documentation built on Feb. 17, 2025, 6:31 a.m.