pkg <- read.dcf("DESCRIPTION", fields = "Package")[1] title <- read.dcf("DESCRIPTION", fields = "Title")[1] description <- read.dcf("DESCRIPTION", fields = "Description")[1] URL <- read.dcf('DESCRIPTION', fields = 'URL')[1] owner <- tolower(strsplit(URL,"/")[[1]][4])
EpiCompare
is an R package for comparing multiple epigenomic datasets
for quality control and benchmarking purposes. The function outputs a
report in HTML format consisting of three sections:
Note: Peaks located in blacklisted regions and non-standard chromosomes are removed from the files prior to analysis.
To install EpiCompare
use:
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("EpiCompare")
👈 Details
Installing all Imports and Suggests will allow you to use the full functionality of EpiCompare
right away, without having to stop and install extra dependencies later on.
To install these packages as well, use:
BiocManager::install("EpiCompare", dependencies=TRUE)
Note that this will increase installation time, but it means that you won't have to worry about installing any R packages when using functions with certain suggested dependencies
👈 Details
To install the development version of EpiCompare
, use:
if (!require("remotes")) install.packages("remotes") remotes::install_github("neurogenomics/EpiCompare")
If you use r pkg
, please cite:
r citation(pkg)$textVersion
The documentation in this README and the GitHub Pages website
pertains to the development version of EpiCompare
.
Older versions of EpiCompare
may have slightly different documentation
(e.g. available functions, parameters). For documentation in older versions of
EpiCompare
, please see the Documentation section of the relevant
version on Bioconductor
Load package and example datasets.
library(EpiCompare) data("encode_H3K27ac") # example peakfile data("CnT_H3K27ac") # example peakfile data("CnR_H3K27ac") # example peakfile data("CnT_H3K27ac_picard") # example Picard summary output data("CnR_H3K27ac_picard") # example Picard summary output
Prepare input files:
# create named list of peakfiles peakfiles <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) # set ref file and name reference <- list("ENCODE_H3K27ac" = encode_H3K27ac) # create named list of Picard summary picard_files <- list("CnT"=CnT_H3K27ac_picard, "CnR"=CnR_H3K27ac_picard)
👈 Tips on importing user-supplied files
EpiCompare::gather_files
is helpful for identifying and importing
peak or picard files.
# To import BED files as GRanges object peakfiles <- EpiCompare::gather_files(dir = "path/to/peaks/", type = "peaks.stringent") # EpiCompare alternatively accepts paths (to BED files) as input peakfiles <- list(sample1="/path/to/peaks/file1_peaks.stringent.bed", sample2="/path/to/peaks/file2_peaks.stringent.bed") # To import Picard summary output txt file as data frame picard_files <- EpiCompare::gather_files(dir = "path/to/peaks", type = "picard")
Run EpiCompare()
:
EpiCompare::EpiCompare(peakfiles = peakfiles, genome_build = list(peakfiles="hg19", reference="hg38"), genome_build_output = "hg19", picard_files = picard_files, reference = reference, run_all = TRUE output_dir = tempdir())
These input parameters must be provided:
👈 Details
peakfiles
: Peakfiles you want to analyse. EpiCompare accepts
peakfiles as GRanges object and/or as paths to BED files. Files must
be listed and named using list()
.
E.g. list("name1"=peakfile1, "name2"=peakfile2)
.genome_build
: A named list indicating the human genome build used to
generate each of the following inputs:peakfiles
: Genome build for the peakfiles
input. Assumes genome build
is the same for each element in the peakfiles
list.reference
: Genome build for the reference
input.blacklist
: Genome build for the blacklist
input. genome_build = list(peakfiles="hg38", reference="hg19", blacklist="hg19")
genome_build_output
Genome build to standardise all inputs to. Liftovers
will be performed automatically as needed. Default is "hg19".blacklist
: Peakfile as GRanges object specifying genomic regions
that have anomalous and/or unstructured signals independent of the
cell-line or experiment. For human hg19 and hg38 genome, use
built-in data data(hg19_blacklist)
and data(hg38_blacklist)
respectively. For mouse mm10 genome, use built-in data data(mm10_blacklist)
.output_dir
: Please specify the path to directory, where all
EpiCompare
outputs will be saved.The following input files are optional:
👈 Details
picard_files
: A list of summary metrics output from
Picard. Picard MarkDuplicates
can be used to identify the duplicate reads amongst the alignment. This tool
generates a summary output, normally with the ending
.markdup.MarkDuplicates.metrics.txt. If this input is provided, metrics on
fragments (e.g. mapped fragments and duplication rate) will be included
in the report. Files must be in data.frame format and listed using list()
and named using names()
. To import Picard duplication metrics (.txt file)
into R as data frame, use
picard <- read.table("/path/to/picard/output", header = TRUE, fill = TRUE)
.reference
: Reference peak file(s) is used in stat_plot
and
chromHMM_plot
. File must be in GRanges
object, listed and named
using list("reference_name" = GRanges_obect)
. If more than one reference
is specified, EpiCompare
outputs individual reports for each reference.
However, please note that this can take awhile. By default, these plots will not be included in the report unless set to TRUE
.
To turn on all features at once, simply use the run_all=TRUE
argument:
👈 Details
upset_plot
: Upset plot of overlapping peaks between samples.stat_plot
: included only if a reference
dataset is provided.
The plot shows statistical significance (p/q-values) of sample peaks
that are overlapping/non-overlapping with the reference
dataset.chromHMM_plot
: ChromHMM annotation of peaks. If a reference
dataset is provided, ChromHMM annotation of overlapping and
non-overlapping peaks with the reference
is also included in the
report.chipseeker_plot
: ChIPseeker annotation of peaks.enrichment_plot
: KEGG pathway and GO enrichment analysis of
peaks.tss_plot
: Peak frequency around (+/- 3000bp) transcriptional
start site. Note that it may take awhile to generate this plot for
large sample sizes.precision_recall_plot
: Plot showing the precision-recall score across
the peak calling stringency thresholds. corr_plot
: Plot showing the correlation between the quantiles when the
genome is binned at a set size. These quantiles are based on the intensity
of the peak, dependent on the peak caller used (q-value for MACS2). 👈 Details
chromHMM_annotation
: Cell-line annotation for ChromHMM. Default
is K562. Options are:interact
: By default, all heatmaps (percentage overlap and
ChromHMM heatmaps) in the report will be interactive. If set FALSE,
all heatmaps will be static. N.B. If interact=TRUE
, interactive
heatmaps will be saved as html files, which may take time for larger
sample sizes.output_filename
: By default, the report is named EpiCompare.html.
You can specify the file name of the report here.output_timestamp
: By default FALSE. If TRUE, the filename of the
report includes the date.EpiCompare
outputs the following:
output_dir
save_output=TRUE
, all plots generated by
EpiCompare
will be saved in EpiCompare_file directory also in
specified output_dir
An example report comparing ATAC-seq and DNase-seq can be found here
EpiCompare
includes several built-in datasets:
👈 Details
encode_H3K27ac
: Human H3K27ac peak file generated with ChIP-seq using K562
cell-line. Taken from ENCODE
project. For more information, run ?encode_H3K27ac
. CnT_H3K27ac
: Human H3K27ac peak file generated with CUT&Tag using K562
cell-line from Kaya-Okur et al., (2019). For more
information, run ?CnT_H3K27ac
. CnR_H3K27ac
: Human H3K27ac peak file generated with CUT&Run using K562
cell-line from Meers et al., (2019).
For more details, run ?CnR_H3K27ac
. UK Dementia Research Institute
Department of Brain Sciences
Faculty of Medicine
Imperial College London
GitHub
DockerHub
👈 Details
utils::sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.