EpiCompare: Compare epigenomic datasets
In neurogenomics/EpiCompare: Comparison, Benchmarking & QC of Epigenomic Datasets

EpiCompare

R Documentation

Compare epigenomic datasets

Description

This function compares and analyses multiple epigenomic datasets and outputs an HTML report containing all results of the analysis. The report is mainly divided into three sections: (1) General Metrics on Peakfiles, (2) Peak Overlaps and (3) Functional Annotation of Peaks.

Usage

EpiCompare(
  peakfiles,
  genome_build,
  genome_build_output = "hg19",
  blacklist = NULL,
  picard_files = NULL,
  reference = NULL,
  upset_plot = FALSE,
  stat_plot = FALSE,
  chromHMM_plot = FALSE,
  chromHMM_annotation = "K562",
  chipseeker_plot = FALSE,
  enrichment_plot = FALSE,
  tss_plot = FALSE,
  tss_distance = c(-3000, 3000),
  precision_recall_plot = FALSE,
  n_threshold = 20,
  corr_plot = FALSE,
  bin_size = 5000,
  interact = TRUE,
  add_download_button = FALSE,
  save_output = FALSE,
  output_filename = "EpiCompare",
  output_timestamp = FALSE,
  output_dir,
  display = NULL,
  run_all = FALSE,
  workers = 1,
  quiet = FALSE,
  error = FALSE,
  debug = FALSE
)

Arguments

`peakfiles`	A list of peak files as GRanges object and/or as paths to BED files. If paths are provided, EpiCompare imports the file as GRanges object. EpiCompare also accepts a list containing a mix of GRanges objects and paths.Files must be listed and named using `list()`. E.g. `list("name1"=file1, "name2"=file2)`. If no names are specified, default file names will be assigned.
`genome_build`	A named list indicating the genome build used to generate each of the following inputs: "peakfiles" : Genome build for the `peakfiles` input. Assumes genome build is the same for each element in the `peakfiles` list. "reference" : Genome build for the `reference` input. "blacklist" : Genome build for the `blacklist` input. Example input list: `genome_build = list(peakfiles="hg38", reference="hg19", blacklist="hg19")` Alternatively, you can supply a single character string instead of a list. This should only be done in situations where all three inputs (`peakfiles`, `reference`, `blacklist`) are of the same genome build. For example: `genome_build = "hg19"` Supported genome builds are: "hg19", "hg38", "mm9" and "mm10".
`genome_build_output`	Genome build to standardise all inputs to. Liftovers will be performed automatically as needed. Default: "hg19". Note: Cross-species liftovers are supported.
`blacklist`	A GRanges object containing blacklisted genomic regions. Blacklists included in EpiCompare are: `NULL` (default): Automatically selects the appropriate blacklist based on the `genome_build_output` argument. "hg19_blacklist": Regions of hg19 genome that have anomalous and/or unstructured signals. hg19_blacklist "hg38_blacklist": Regions of hg38 genome that have anomalous and/or unstructured signals. hg38_blacklist "mm10_blacklist": Regions of mm10 genome that have anomalous and/or unstructured signals. mm10_blacklist "mm9_blacklist": Blacklisted regions of mm10 genome that have been lifted over from mm10_blacklist. mm9_blacklist `<user_input>`: A custom user-provided blacklist in GRanges format.
`picard_files`	A list of summary metrics output from Picard. Files must be in data.frame format and listed using `list()` and named using `names()`. To import Picard duplication metrics (.txt file) into R as data frame, use: `picard <- read.table("/path/to/picard/output", header = TRUE, fill = TRUE)`.
`reference`	A named list containing reference peak file(s) as GRanges object. Please ensure that the reference file is listed and named i.e. `list("reference_name" = reference_peak)`. If more than one reference is specified, individual reports for each reference will be generated. However, please note that specifying more than one reference can take awhile. If a reference is specified, it enables two analyses: (1) plot showing statistical significance of overlapping/non-overlapping peaks; and (2) ChromHMM of overlapping/non-overlapping peaks.
`upset_plot`	Default FALSE. If TRUE, the report includes upset plot of overlapping peaks.
`stat_plot`	Default FALSE. If TRUE, the function creates a plot showing the statistical significance of overlapping/non-overlapping peaks. Reference peak file must be provided.
`chromHMM_plot`	Default FALSE. If TRUE, the function outputs ChromHMM heatmap of individual peak files. If a reference peak file is provided, ChromHMM annotation of overlapping and non-overlapping peaks is also provided.
`chromHMM_annotation`	ChromHMM annotation for ChromHMM plots. Default K562 cell-line. Cell-line options are: "K562" = K-562 cells "Gm12878" = Cellosaurus cell-line GM12878 "H1hesc" = H1 Human Embryonic Stem Cell "Hepg2" = Hep G2 cell "Hmec" = Human Mammary Epithelial Cell "Hsmm" = Human Skeletal Muscle Myoblasts "Huvec" = Human Umbilical Vein Endothelial Cells "Nhek" = Normal Human Epidermal Keratinocytes "Nhlf" = Normal Human Lung Fibroblasts
`chipseeker_plot`	Default FALSE. If TRUE, the report includes a barplot of ChIPseeker annotation of peak files.
`enrichment_plot`	Default FALSE. If TRUE, the report includes dotplots of KEGG and GO enrichment analysis of peak files.
`tss_plot`	Default FALSE. If TRUE, the report includes peak count frequency around transcriptional start site. Note that this can take awhile.
`tss_distance`	A vector specifying the distance upstream and downstream around transcription start sites (TSS). The default value is `c(-3000,3000)`; meaning peak frequency 3000bp upstream and downstream of TSS will be displayed.
`precision_recall_plot`	Default is FALSE. If TRUE, creates a precision-recall curve plot and an F1 plot using plot_precision_recall.
`n_threshold`	Number of thresholds to test.
`corr_plot`	Default is FALSE. If TRUE, creates a correlation plot across all peak files using plot_corr.
`bin_size`	Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage.
`interact`	Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static.
`add_download_button`	Add download buttons for each plot or dataset.
`save_output`	Default FALSE. If TRUE, all outputs (tables and plots) of the analysis will be saved in a folder (EpiCompare_file).
`output_filename`	Default EpiCompare.html. If otherwise, the html report will be saved in the specified name.
`output_timestamp`	Default FALSE. If TRUE, date will be included in the file name.
`output_dir`	Path to where output HTML file should be saved.
`display`	After completion, automatically display the HTML report file in one of the following ways: "browser" : Display the report in your default web browser. "rsstudio" : Display the report in Rstudio. NULL (default) : Do not display the report.
`run_all`	Convenience argument that enables all plots/features (without specifying each argument manually) by overriding the default values. Default: `FALSE`.
`workers`	Number of threads to parallelize across.
`quiet`	An option to suppress printing during rendering from knitr, pandoc command line and others. To only suppress printing of the last "Output created: " message, you can set `rmarkdown.render.message` to `FALSE`
`error`	If `TRUE`, the Rmarkdown report will continue to render even when some chunks encounter errors (default: `FALSE`). Passed to opts_chunk.
`debug`	Run in debug mode, where are messages and warnings are printed within the HTML report (default: `FALSE`).

Value

Path to one or more HTML report files.

Examples

### Load Data ###
data("encode_H3K27ac") # example dataset as GRanges object
data("CnT_H3K27ac") # example dataset as GRanges object
data("CnR_H3K27ac") # example dataset as GRanges object
data("CnT_H3K27ac_picard") # example Picard summary output
data("CnR_H3K27ac_picard") # example Picard summary output

#### Prepare Input ####
# create named list of peakfiles
peakfiles <- list(CnR=CnR_H3K27ac, CnT=CnT_H3K27ac)
# create named list of picard outputs
picard_files <- list(CnR=CnR_H3K27ac_picard, CnT=CnT_H3K27ac_picard)
# reference peak file
reference <- list("ENCODE" = encode_H3K27ac)

### Run EpiCompare ###
output_html <- EpiCompare(peakfiles = peakfiles,
           genome_build = list(peakfiles="hg19",
                               reference="hg19"),
           picard_files = picard_files,
           reference = reference,
           output_filename = "EpiCompare_test",
           output_dir = tempdir())
# utils::browseURL(output_html)

neurogenomics/EpiCompare documentation built on Feb. 17, 2025, 6:31 a.m.