pkg <- read.dcf("../DESCRIPTION", fields = "Package")[1] library(pkg, character.only = TRUE)
library(`r pkg`)
Some peaks files are already made available on GEO.
import_peaks
will import these as a named list GenomicRanges
,
where the names are the GEO sample ids
.
grl <- PeakyFinders::import_peaks( ids = "GSM945244", searches = construct_searches(keys = c("narrowpeak", "broadpeak")))
By default, the genome-wide files are imported.
However, you can also query specific subset of these peak files
defined by query_granges
.
Here, we'll just construct query for all of chromosome 22
using the helper function get_genome
query_granges <- PeakyFinders::get_genome(genome = "hg19", keep.chr = 22) grl2 <- PeakyFinders::import_peaks( ids = "GSM945244", searches = construct_searches(keys = c("narrowpeak", "broadpeak")), builds = "hg19", query_granges = query_granges, query_granges_build = "hg19") knitr::kable( head(data.frame(grl2$GSM945244)) )
When no pre-computed peaks are available, you can instead use import_peaks
to compute peaks from bigWig or bedGraph files. The default peak calling method is currently MACS3
via the MACSr
package.
Note:
By default, peaks are called using whole-chromosomes, from which a subset specified by
query_granges
is returned. This ensures that an appropriate background is used when computing peaks. You can also call peaks using the whole genome at once, but this is usally less computationally efficient than splitting the task by chromosome (i.e. settingsplit_chromosomes=TRUE
whenquery_granges
is not provided).By default,
import_peaks
(and the subfunctioncall_peaks
) automatically infer a reasonable cutoff threshold based on an iterative testing procedure. See?PeakyFinders::call_peaks
for more details.
Warning:
Peak calling with MACS3
/MACSr
is not currently compatible with Windows OS.
### rtracklayer::import.bw() is currently broken on Windows. if(.Platform$OS.type!="windows"){ grl3 <- PeakyFinders::import_peaks( ids = "GSM945244", searches = construct_searches(keys = "bigwig"), builds = "hg19", query_granges = query_granges, query_granges_build = "hg19" ) knitr::kable( head(data.frame(grl3$GSM945244)) ) }
To speed up the process of importing and/or calling peaks,
you can take advantage of the parallelisation features in import_peaks
.
For example, if you want to call peaks across the entire genome
from a given bigWig file, you could speed up this process by dividing the task by chromosome (split_chromosomes=TRUE
) and distributing it across multiple cores or compute nodes (nThread=2
or greater).
Internally, PeakyFinders
takes a number of steps to optimise these parallel processes, so you don't have to worry about it as much. This helps to avoid common issues stemming from excessive numbers of queries, both to the servers where the peak data is hosted, and to the C libraries that R relies on to makes these queries.
Note:
It's usually a good idea to leave several cores free when running parallel processes, so that you can still have enough resources to use your computer while the function is running. In the example below, I'm using 10/12 cores on my local laptop.
Warning:
Parallelisation is not currently compatible with Windows OS.
If Windows OS is detected, import_peaks will automatically switch to serial processing with nThread=1
.
grl4 <- PeakyFinders::import_peaks( ids = "GSM945244", searches = construct_searches(keys = "bigwig"), split_chromosomes = TRUE, nThread = 10 )
utils::sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.