import_peaks: Import peaks

View source: R/import_peaks.R

import_peaksR Documentation

Import peaks

Description

Import pre-computed peak files, or compute new peaks from bedGraph/bigWig files. Can import a subset of ranges specified by query_granges, or across the whole genome by setting query_granges=NULL.
Currently recognizes IDs from:

  • GEO :

  • ENCODE : See peaks_metadata_encode for example metadata.

  • ROADMAP : See peaks_metadata_roadmap for example metadata.

  • AnnotationHub : See peaks_metadata_annotationhub for example metadata.

Notable features:

  1. Automatically infers which database each accession ID is from and organizes the outputs accordingly.

  2. Automatically infers which function is needed to import which file types.

  3. Automatically calls peaks from any bedGraph/bigWig files.

  4. query_granges can be a different genome build than the files being imported, as the query_granges will be lifted over to the correct genome build with liftover_grlist.

  5. When nThread>1, accelerates file importing and peak calling using multi-core parallelisation.

Usage

import_peaks(
  ids,
  builds = "hg19",
  query_granges = NULL,
  query_granges_build = NULL,
  split_chromosomes = FALSE,
  condense_queries = TRUE,
  force_new = FALSE,
  method = "MACSr",
  cutoff = NULL,
  searches = construct_searches(),
  peaks_dir = tempdir(),
  save_path = tempfile(fileext = "_PeakyFinders_grl.rds"),
  nThread = 1,
  verbose = TRUE
)

Arguments

ids

IDs from one of the supported databases. IDs can be at any level: file, sample, or experiment.

builds

Genome build that each sample in ids is aligned to. This will determine whether whether the query_granges data need to be lifted over to different genome build before querying. Can be a single character string applied to all ids (e.g. "hg19"), or a vector of the same length as ids named using the ids (e.g. c("GSM4271282"="hg19", "ENCFF048VDO"="hg38")).

query_granges

[Optional] GRanges object indicating which genomic regions to extract from each sample.

query_granges_build

[Optional] Genome build that query_granges is aligned to.

split_chromosomes

Split single-threaded query into multi-threaded query across chromosomes. This is can be helpful especially when calling peaks from large bigWig/bedGraph files. The number of threads used is set by the nThread argument.

condense_queries

Condense query_granges by taking the min/max position per chromosome (default: TRUE). This helps to reduce the total number of queries, which can cause memory allocation problems due to repeated calls to the underlying C libraries.

force_new

By default, saved results of the same save_path name will be imported instead of running queries. However you can override this by setting force_new to perform new queries regardless and overwrite the old save_path file.

method

Method to call peaks with:

  • "MACSr" : Uses MACS3 via bdgpeakcall.

  • "SEACR" : Uses SEACR via find_packages.

cutoff
  • when method="MACSr" : Passed to cutoff argument. Cutoff depends on which method you used for score track. If the file contains pvalue scores from MACS3, score 5 means pvalue 1e-5. If NULL, a reasonable cutoff value will be inferred through a cutoff_analysis.

  • when method="SEACR" : Passed to control argument. Control (IgG) data bedgraph file to generate an empirical threshold for peak calling. Alternatively, a numeric threshold n between 0 and 1 returns the top n fraction of peaks based on total signal within peaks (default: 0.05).

searches

Named list of regex queries.

peaks_dir

Directory to save peaks to (only used when calling peaks from bedGraph files).

save_path

Path to save query results to in .rds format.

nThread

When nThread>1, accelerates file importing and peak calling using multi-core parallelisation.

verbose

Print messages.

Value

A nested named list of peak files in GRanges format. Nesting structure is as follows: database -> id -> GRanges object Each GRanges object contains all the peak data that was found for that particular id, merged into one. You can differentiate the various source file types by looking at the column "peaktype". If peaks could not be recovered for a sample, that element will be set to NULL.

Examples

out_list <- PeakyFinders::import_peaks(
    ids = c("GSM945244"),# "ENCSR000AHD"
    searches = PeakyFinders::construct_searches(keys = "narrowpeak"))

neurogenomics/PeakyFinders documentation built on Oct. 14, 2024, 3:09 p.m.