In jrboyd/peakrefine: Peak Refinement Metric Gathering and Evalutation by Motif Enrichment

knitr::opts_chunk$set(echo = TRUE, eval = FALSE,  fig.path = "README_figs/README-")

Refining peaks using metrics derived from Strand Cross-Correlation (SCC)

This package is under active development but the methods described here are considered stable.

Eventually peakrefine will be rolled into my existing bioconductor package, seqsetvis, though still maintained separately here.

Installation:

if(!require(devtools))
    install.packages("devtools")
devtools::install_github("jrboyd/peakrefine")

Basic Usage

bam_file = "path/to/your/bam" #must be indexed
peaks_gr = rtracklayer::import("path/to/your/peaks")
corr_res = peakrefine::calcSCCMetrics(bam_file, peaks_gr, frag_sizes = 50:250)

Run this way, 100 peaks will take a bit under a minute.

Results will be automatically cached using BiocFileCache such that repeated calls with identical parameters load previous results.

Running in parallel

peakrefine uses the mc.cores option to automatically split up peaks for calculations. If you run into memory or similar issues you should set n_splits to a multiple of mc.cores (this applies to running in serial as well).
The default is 1 job per core which may prove too large.

ncores = max(1, parallel::detectCores() - 1)
options(mc.cores = ncores)
peakrefine::calcSCCMetrics(bam_file, peaks_gr, 
                           frag_sizes = 50:250, n_splits = ncores * 3)

bam_file = "simulation/qc_framework/output_AF_RUNX1_ChIP/AF-MCF10A_RUNX1_pooled/AF-MCF10A_RUNX1_pooled.bam" #must be indexed
peaks_gr = rtracklayer::import("simulation/qc_framework/output_AF_RUNX1_ChIP/AF-MCF10A_RUNX1_pooled/AF-MCF10A_RUNX1_pooled_peaks.narrowPeak")
set.seed(1)
peaks_gr = sample(peaks_gr, 100)
options(mc.cores = 20)
corr_res = peakrefine::calcSCCMetrics(bam_file, 
                                      peaks_gr, 
                                      frag_sizes = 50:250)

Using the output

knitr::opts_chunk$set(echo = TRUE, eval = TRUE)

Output is a named list:

cbind(names(corr_res))

Getting read and fragment lengths:

corr_res$read_length
corr_res$fragment_length

Other list items are data.tables:

sapply(corr_res, class)

We'll need a couple more libraries here.

library(data.table)
library(ggplot2)

There are 3 extracted correlation metrics:

flex_fragment_correlation - the maximum correlation at any shift
stable_fragment_correlation - the correlation for the average fragment size (determined by shift values in flex)
read_correlation - the correlation for the read size (determined by mode of bam)

#items 3-5 of the results list contain these metrics
metrics_dt = rbindlist(corr_res[3:5], use.names = TRUE, idcol = "metric")
metrics_dt[, metric := sub("fragment", "frag", sub("_correlation", "", metric))]
theme_set(theme_classic())
ggplot(metrics_dt, aes(x = metric, y = correlation, color = metric)) + 
    geom_boxplot() + 
    guides(color = "none")

Stable fragment correlation uses the calculated fragment size while flex always uses the fragment size with the maximum SCC. Flex will report high correlation from artifact peaks but is useful for assessing fragment size distribution.

ggplot(metrics_dt, aes(x = metric, y = shift, color = metric)) + 
    geom_boxplot() + 
    guides(color = "none")

The last item in the list "full_correlation_results" stores the correlation for every value of shift. The previous 3 are simply extracted from the full results for conveinence

to_plot = corr_res$full_correlation_results#[id %in% sample(unique(id), 3)]
to_plot$`Expected_Size` = "in expected range"
out_of_range_ids = unique(corr_res$flex_fragment_correlation[
    shift < 100 | shift > 200]$id)
to_plot[id %in% out_of_range_ids, `Expected_Size` := "out of range"]
ggplot(to_plot, aes(x = shift, y = correlation, group = id)) + 
    geom_path(size = 1.5, alpha = .1) + facet_wrap("Expected_Size") + 
    labs(title = "SCC profiles", 
         subtitle = "expected fragment size between 100-200 bp")

jrboyd/peakrefine documentation built on July 30, 2020, 7:13 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jrboyd/peakrefine
Peak Refinement Metric Gathering and Evalutation by Motif Enrichment

In jrboyd/peakrefine: Peak Refinement Metric Gathering and Evalutation by Motif Enrichment

Installation:

Basic Usage

Running in parallel

Using the output

R Package Documentation

Browse R Packages

We want your feedback!

jrboyd/peakrefine Peak Refinement Metric Gathering and Evalutation by Motif Enrichment

In jrboyd/peakrefine: Peak Refinement Metric Gathering and Evalutation by Motif Enrichment

Installation:

Basic Usage

Running in parallel

Using the output

R Package Documentation

Browse R Packages

We want your feedback!

jrboyd/peakrefine
Peak Refinement Metric Gathering and Evalutation by Motif Enrichment