README.md

EpiCompare⚖QC and Benchmarking of Epigenomic Datasets

download License:
GPL-3 R build
status

Authors: Sera Choi, Brian Schilder, Leyla Abbasova, Alan Murphy, Nathan Skene

Updated: Mar-08-2023

Introduction

EpiCompare is an R package for comparing multiple epigenomic datasets for quality control and benchmarking purposes. The function outputs a report in HTML format consisting of three sections:

  1. General Metrics: Metrics on peaks (percentage of blacklisted and non-standard peaks, and peak widths) and fragments (duplication rate) of samples.
  2. Peak Overlap: Frequency, percentage, statistical significance of overlapping and non-overlapping peaks. This also includes Upset, precision-recall and correlation plots.
  3. Functional Annotation: Functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks. Also includes peak enrichment around Transcription Start Site.

Note: Peaks located in blacklisted regions and non-standard chromosomes are removed from the files prior to analysis.

Installation

Standard

To install EpiCompare use:

if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("EpiCompare") 

All dependencies

👈 Details

Installing all Imports and Suggests will allow you to use the full functionality of EpiCompare right away, without having to stop and install extra dependencies later on.

To install these packages as well, use:

BiocManager::install("EpiCompare", dependencies=TRUE) 

Note that this will increase installation time, but it means that you won’t have to worry about installing any R packages when using functions with certain suggested dependencies

Development

👈 Details

To install the development version of EpiCompare, use:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("neurogenomics/EpiCompare")

Citation

If you use EpiCompare, please cite:

EpiCompare: R package for the comparison and quality control of epigenomic peak files (2022) Sera Choi, Brian M. Schilder, Leyla Abbasova, Alan E. Murphy, Nathan G. Skene, bioRxiv, 2022.07.22.501149; doi: https://doi.org/10.1101/2022.07.22.501149

Documentation

EpiCompare website

Docker/Singularity container

Bioconductor page

:warning: Note on documentation versioning

The documentation in this README and the GitHub Pages website pertains to the development version of EpiCompare. Older versions of EpiCompare may have slightly different documentation (e.g. available functions, parameters). For documentation in older versions of EpiCompare, please see the Documentation section of the relevant version on Bioconductor

Usage

Load package and example datasets.

library(EpiCompare)
data("encode_H3K27ac") # example peakfile
data("CnT_H3K27ac") # example peakfile
data("CnR_H3K27ac") # example peakfile
data("CnT_H3K27ac_picard") # example Picard summary output
data("CnR_H3K27ac_picard") # example Picard summary output

Prepare input files:

# create named list of peakfiles 
peakfiles <- list("CnT"=CnT_H3K27ac, 
                  "CnR"=CnR_H3K27ac) 
# set ref file and name 
reference <- list("ENCODE_H3K27ac" = encode_H3K27ac) 
# create named list of Picard summary
picard_files <- list("CnT"=CnT_H3K27ac_picard, 
                     "CnR"=CnR_H3K27ac_picard) 

👈 Tips on importing user-supplied files

EpiCompare::gather_files is helpful for identifying and importing peak or picard files.

# To import BED files as GRanges object
peakfiles <- EpiCompare::gather_files(dir = "path/to/peaks/", 
                                      type = "peaks.stringent")
# EpiCompare alternatively accepts paths (to BED files) as input 
peakfiles <- list(sample1="/path/to/peaks/file1_peaks.stringent.bed", 
                  sample2="/path/to/peaks/file2_peaks.stringent.bed")
# To import Picard summary output txt file as data frame
picard_files <- EpiCompare::gather_files(dir = "path/to/peaks", 
                                         type = "picard")

Run EpiCompare():

EpiCompare::EpiCompare(peakfiles = peakfiles,
                       genome_build = list(peakfiles="hg19",
                                           reference="hg38"),
                       genome_build_output = "hg19", 
                       picard_files = picard_files,
                       reference = reference,
                       run_all = TRUE
                       output_dir = tempdir())

Required Inputs

These input parameters must be provided:

👈 Details

Optional Inputs

The following input files are optional:

👈 Details

Optional Plots

By default, these plots will not be included in the report unless set to TRUE. To turn on all features at once, simply use the run_all=TRUE argument:

👈 Details

Other Options

👈 Details

Outputs

EpiCompare outputs the following:

  1. HTML report: A summary of all analyses saved in specified output_dir
  2. EpiCompare_file: if save_output=TRUE, all plots generated by EpiCompare will be saved in EpiCompare_file directory also in specified output_dir

An example report comparing ATAC-seq and DNase-seq can be found here

Datasets

EpiCompare includes several built-in datasets:

👈 Details

Session Info

👈 Details

utils::sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] pillar_1.8.1        compiler_4.2.1      RColorBrewer_1.1-3 
##  [4] BiocManager_1.30.20 bitops_1.0-7        yulab.utils_0.0.6  
##  [7] tools_4.2.1         digest_0.6.31       jsonlite_1.8.4     
## [10] evaluate_0.20       lifecycle_1.0.3     tibble_3.1.8       
## [13] gtable_0.3.1        pkgconfig_2.0.3     rlang_1.0.6        
## [16] graph_1.76.0        cli_3.6.0           rstudioapi_0.14    
## [19] rvcheck_0.2.1       yaml_2.3.7          xfun_0.37          
## [22] fastmap_1.1.0       dplyr_1.1.0         knitr_1.42         
## [25] generics_0.1.3      desc_1.4.2          vctrs_0.5.2        
## [28] dlstats_0.1.6       stats4_4.2.1        rprojroot_2.0.3    
## [31] grid_4.2.1          tidyselect_1.2.0    here_1.0.1         
## [34] Biobase_2.58.0      glue_1.6.2          R6_2.5.1           
## [37] fansi_1.0.4         XML_3.99-0.13       RBGL_1.74.0        
## [40] rmarkdown_2.20.1    ggplot2_3.4.1       badger_0.2.3       
## [43] magrittr_2.0.3      BiocGenerics_0.44.0 biocViews_1.66.2   
## [46] scales_1.2.1        htmltools_0.5.4     rworkflows_0.99.7  
## [49] RUnit_0.4.32        colorspace_2.1-0    renv_0.17.0        
## [52] utf8_1.2.3          RCurl_1.98-1.10     munsell_0.5.0

Contact

Neurogenomics Lab

UK Dementia Research Institute Department of Brain Sciences Faculty of Medicine Imperial College London GitHub DockerHub



serachoi1230/EpiCompare documentation built on Jan. 30, 2024, 11:37 a.m.