knitr::opts_chunk$set(echo = TRUE, eval = FALSE, fig.path = "README_figs/README-")
Refining peaks using metrics derived from Strand Cross-Correlation (SCC)
This package is under active development but the methods described here are considered stable.
Eventually peakrefine will be rolled into my existing bioconductor package, seqsetvis, though still maintained separately here.
if(!require(devtools)) install.packages("devtools") devtools::install_github("jrboyd/peakrefine")
bam_file = "path/to/your/bam" #must be indexed peaks_gr = rtracklayer::import("path/to/your/peaks") corr_res = peakrefine::calcSCCMetrics(bam_file, peaks_gr, frag_sizes = 50:250)
Run this way, 100 peaks will take a bit under a minute.
Results will be automatically cached using BiocFileCache such that repeated calls with identical parameters load previous results.
peakrefine uses the mc.cores option to automatically split up peaks for
calculations. If you run into memory or similar issues you should set n_splits
to a multiple of mc.cores (this applies to running in serial as well).
The default is 1 job per core which may prove
too large.
ncores = max(1, parallel::detectCores() - 1) options(mc.cores = ncores) peakrefine::calcSCCMetrics(bam_file, peaks_gr, frag_sizes = 50:250, n_splits = ncores * 3)
bam_file = "simulation/qc_framework/output_AF_RUNX1_ChIP/AF-MCF10A_RUNX1_pooled/AF-MCF10A_RUNX1_pooled.bam" #must be indexed peaks_gr = rtracklayer::import("simulation/qc_framework/output_AF_RUNX1_ChIP/AF-MCF10A_RUNX1_pooled/AF-MCF10A_RUNX1_pooled_peaks.narrowPeak") set.seed(1) peaks_gr = sample(peaks_gr, 100) options(mc.cores = 20) corr_res = peakrefine::calcSCCMetrics(bam_file, peaks_gr, frag_sizes = 50:250)
knitr::opts_chunk$set(echo = TRUE, eval = TRUE)
Output is a named list:
cbind(names(corr_res))
Getting read and fragment lengths:
corr_res$read_length corr_res$fragment_length
Other list items are data.tables:
sapply(corr_res, class)
We'll need a couple more libraries here.
library(data.table) library(ggplot2)
There are 3 extracted correlation metrics:
#items 3-5 of the results list contain these metrics metrics_dt = rbindlist(corr_res[3:5], use.names = TRUE, idcol = "metric") metrics_dt[, metric := sub("fragment", "frag", sub("_correlation", "", metric))] theme_set(theme_classic()) ggplot(metrics_dt, aes(x = metric, y = correlation, color = metric)) + geom_boxplot() + guides(color = "none")
Stable fragment correlation uses the calculated fragment size while flex always uses the fragment size with the maximum SCC. Flex will report high correlation from artifact peaks but is useful for assessing fragment size distribution.
ggplot(metrics_dt, aes(x = metric, y = shift, color = metric)) + geom_boxplot() + guides(color = "none")
The last item in the list "full_correlation_results" stores the correlation for every value of shift. The previous 3 are simply extracted from the full results for conveinence
to_plot = corr_res$full_correlation_results#[id %in% sample(unique(id), 3)] to_plot$`Expected_Size` = "in expected range" out_of_range_ids = unique(corr_res$flex_fragment_correlation[ shift < 100 | shift > 200]$id) to_plot[id %in% out_of_range_ids, `Expected_Size` := "out of range"] ggplot(to_plot, aes(x = shift, y = correlation, group = id)) + geom_path(size = 1.5, alpha = .1) + facet_wrap("Expected_Size") + labs(title = "SCC profiles", subtitle = "expected fragment size between 100-200 bp")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.