ssvFetchBam: Iterates a character vector (ideally named) and calls...

View source: R/functions_fetch_bam.R

ssvFetchBamR Documentation

Iterates a character vector (ideally named) and calls ssvFetchBam.single on each. Appends grouping variable to each resulting data.table and uses rbindlist to efficiently combine results

Description

ssvFetchBam iteratively calls fetchWindowedBam.single. See ssvFetchBam.single for more info.

Usage

ssvFetchBam(
  file_paths,
  qgr,
  unique_names = NULL,
  names_variable = "sample",
  file_attribs = NULL,
  win_size = 50,
  win_method = c("sample", "summary")[1],
  summary_FUN = stats::weighted.mean,
  fragLens = "auto",
  target_strand = c("*", "+", "-", "both")[1],
  flip_strand = FALSE,
  anchor = c("left", "left_unstranded", "center", "center_unstranded")[3],
  return_data.table = FALSE,
  max_dupes = Inf,
  splice_strategy = c("none", "ignore", "add", "only", "splice_count")[1],
  n_cores = getOption("mc.cores", 1),
  n_region_splits = 1,
  return_unprocessed = FALSE,
  force_skip_centerFix = FALSE,
  ...
)

Arguments

file_paths

character vector of file_paths to load from. Alternatively, file_paths can be a data.frame or data.table whose first column is a character vector of paths and additial columns will be used as metadata.

qgr

Set of GRanges to query. For valid results the width of each interval should be identical and evenly divisible by win_size.

unique_names

names to use in final data.table to designate source bigwig. Default is 'sample'

names_variable

The column name where unique_names are stored.

file_attribs

optional data.frame/data.table with one row per item in file paths. Each column will be a variable added to final tidy output.

win_size

The window size that evenly divides widths in qgr.

win_method

character. one of c("sample", "summary"). Determines if viewGRangesWinSample_dt or viewGRangesWinSummary_dt is used to represent each region in qgr.

summary_FUN

function. only relevant if win_method is "summary". passed to viewGRangesWinSummary_dt.

fragLens

numeric. The fragment length to use to extend reads. The default value "auto" causes an automatic calculation from 100 regions in qgr. NA causes no extension of reads to fragment size.

target_strand

character. One of c("", "+", "-"). Controls filtering of reads by strand. Default of "" combines both strands.

flip_strand

boolean. if TRUE strands are flipped.

anchor

character, one of c("center", "center_unstranded", "left", "left_unstranded")

return_data.table

logical. If TRUE the internal data.table is returned instead of GRanges. Default is FALSE.

max_dupes

numeric >= 1. duplicate reads by strandd start position over this number are removed, Default is Inf.

splice_strategy

character, one of c("none", "ignore", "add", "only", "splice_count"). Default is "none" and spliced alignment are asssumed not present. fragLen will be forced to be NA for any other value. "ignore" will not count spliced regions. add" counts spliced regions along with others, "only" will only count spliced regions and ignore others.

n_cores

integer number of cores to use. Uses mc.cores option if not supplied.

n_region_splits

integer number of splits to apply to qgr. The query GRanges will be split into this many roughly equal parts for increased parallelization. Default is 1, no split.

return_unprocessed

boolean. if TRUE returns read alignment in data.table. Default is FALSE.

force_skip_centerFix

boolean, if TRUE all query ranges will be used "as is". This is already the case by default if win_method == "summary" but may have applications where win_method == "sample".

...

passed to Rsamtools::ScanBamParam()

Details

if qgr contains the range chr1:1-100 and win_size is 10, values from positions chr1 5,15,25...85, and 95 will be retrieved from bw_file

Value

A tidy formatted GRanges (or data.table if specified) containing fetched values.

Examples

if(Sys.info()['sysname'] != "Windows"){
data(CTCF_in_10a_overlaps_gr)
library(GenomicRanges)
bam_f = system.file("extdata/test.bam",
    package = "seqsetvis", mustWork = TRUE)
bam_files = c("a" = bam_f, "b" = bam_f)
qgr = CTCF_in_10a_overlaps_gr[1:5]
bw_gr = ssvFetchBam(bam_files, qgr, win_size = 10)
bw_gr2 = ssvFetchBam(as.list(bam_files), qgr, win_size = 10)

bw_dt = ssvFetchBam(bam_files, qgr, win_size = 10,
    return_data.table = TRUE)
}

jrboyd/seqsetvis documentation built on Jan. 16, 2025, 10:25 a.m.