precision_recall: Compute precision-recall

View source: R/precision_recall.R

precision_recallR Documentation

Compute precision-recall

Description

Compute precision and recall using each GRanges object in peakfiles as the "query" against each GRanges object in reference as the subject.

Usage

precision_recall(
  peakfiles,
  reference,
  thresholding_cols = c("total_signal", "qValue", "Peak Score"),
  initial_threshold = 0,
  n_threshold = 20,
  max_threshold = 1,
  cast = TRUE,
  workers = 1,
  verbose = TRUE,
  save_path = tempfile(fileext = "precision_recall.csv"),
  ...
)

Arguments

peakfiles

A list of peak files as GRanges object and/or as paths to BED files. If paths are provided, EpiCompare imports the file as GRanges object. EpiCompare also accepts a list containing a mix of GRanges objects and paths.Files must be listed and named using list(). E.g. list("name1"=file1, "name2"=file2). If no names are specified, default file names will be assigned.

reference

A named list containing reference peak file(s) as GRanges object. Please ensure that the reference file is listed and named i.e. list("reference_name" = reference_peak). If more than one reference is specified, individual reports for each reference will be generated. However, please note that specifying more than one reference can take awhile. If a reference is specified, it enables two analyses: (1) plot showing statistical significance of overlapping/non-overlapping peaks; and (2) ChromHMM of overlapping/non-overlapping peaks.

thresholding_cols

Depending on which columns are present, GRanges will be filtered at each threshold according to one or more of the following:

  • "total_signal" : Used by the peak calling software SEACR. NOTE: Another SEACR column (e.g. "max_signal") can be used together or instead of "total_signal".

  • "qValue"Used by the peak calling software MACS2/3. Should contain the negative log of the p-values after multiple testing correction.

  • "Peak Score" : Used by the peak calling software HOMER.

initial_threshold

Numeric threshold that was provided to SEACR (via the parameter --ctrl) when calling peaks without an IgG control.

n_threshold

Number of thresholds to test.

max_threshold

Maximum threshold to test.

cast

Cast the data into a format that's more compatible with ggplot2.

workers

Number of threads to parallelize across.

verbose

Print messages.

save_path

File path to save precision-recall results to.

...

Arguments passed on to bpplapply

apply_fun

Iterator function to use.

register_now

Register the cores now with register (TRUE), or simply return the BPPARAM object (default: FALSE).

use_snowparam

Whether to use SnowParam (default: TRUE) or MulticoreParam (FALSE) when parallelising across multiple workers.

progressbar

logical(1) Enable progress bar (based on plyr:::progress_text).

X

Any object for which methods length, [, and [[ are implemented.

FUN

The function to be applied to each element of X.

Value

Overlap

Examples

data("CnR_H3K27ac")
data("CnT_H3K27ac")
data("encode_H3K27ac")
peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac)
reference <- list("encode_H3K27ac" = encode_H3K27ac)

pr_df <- precision_recall(peakfiles = peakfiles,
                          reference = reference,
                          workers = 1)

neurogenomics/EpiCompare documentation built on Oct. 18, 2024, 11:04 p.m.