bam2bw: bam2bw: create a coverage bigwig file from an alignment bam...

View source: R/bam2bw.R

bam2bwR Documentation

bam2bw: create a coverage bigwig file from an alignment bam file.

Description

bam2bw: create a coverage bigwig file from an alignment bam file.

Usage

bam2bw(
  bamfile,
  output_bw,
  bgbam = NULL,
  paired = NULL,
  binWidth = 20L,
  extend = 0L,
  scaling = TRUE,
  shift = 0L,
  type = c("full", "center", "start", "end", "ends"),
  compType = c("log2ratio", "subtract", "log10ppois", "log10FE"),
  strand = c("*", "+", "-"),
  strandMode = 1,
  log1p = FALSE,
  exclude = NULL,
  includeDuplicates = TRUE,
  includeSecondary = FALSE,
  minMapq = 1L,
  minFragLength = 1L,
  maxFragLength = 5000L,
  keepSeqLvls = NULL,
  splitByChr = 3,
  pseudocount = 1L,
  localBackground = 1L,
  only = NULL,
  zeroCap = TRUE,
  forceSeqlevelsStyle = NULL,
  verbose = TRUE,
  binSummarization = c("mean", "max", "min"),
  ...
)

Arguments

bamfile

The path to the signal bam file.

output_bw

The path to the output bigwig file

bgbam

The optional path to the control/input bam file to compute relative signal (see the 'compType' argument).

paired

Logical; whether to consider fragments instead of reads for paired-end data. For spliced (e.g. RNA-seq) data, paired should always be set to FALSE.

binWidth

The window size. A lower value (min 1) means a higher resolution, but larger file size.

extend

The amount *by* which to extend single-end reads (e.g. fragment length minus read length). If 'paired=TRUE' and 'type' is either 'ends' or 'center', then the extension will be applied after taking the (shifted) fragment ends or centers, resulting in ranges of width equal to 'extend'.

scaling

Either TRUE (performs Count Per Million scaling), FALSE (no scaling), or a numeric value by which the signal will be divided. If 'bgbam' is given and 'scaling=TRUE', the background will be scaled to the main signal.

shift

Shift (from 3' to 5') by which reads/fragments will be shifted. If 'shift' is an integer vector of length 2, the first value will represent the shift for the positive strand, and the second for the negative strand. For ATACseq data, one normally uses 'shift=c(4L,-5L)'.

type

Type of the coverage to compile. Either full (full read/fragment), start (count read/fragment start locations), end, center, or 'ends' (both ends of the read/fragment).

compType

The type of relative signal to compute (ignored if 'bgbam' isn't given). Can either be 'log2ratio' (log2-foldchange between signals), 'subtract' (difference), 'log10ppois' (rounded -log10 poisson p-value), 'log10FE' (log10 fold-enrichment). For fold-based signals, 'pseudocount' will be added before computing the ratio. By default negative values are capped (see 'zeroCap').

strand

Strand(s) to capture (any by default).

strandMode

The strandMode of the data (whether the strand is given by the first or second mate, which depends on the library prep protocol). See strandMode for more information. This parameter has no effect unless one of the 'strand', 'extend' parameters or a strand-specific 'shift' are used.

log1p

Whether to log-transform ('log(x+1)') the (scaled) signal. Ignored when the signal is relative to a background.

exclude

An optional GRanges of regions for which overlapping reads should be excluded.

includeDuplicates

Logical, whether to include reads flagged as duplicates.

includeSecondary

Logical; whether to include secondary alignments

minMapq

Minimum mapping quality (1 to 255)

minFragLength

Minimum fragment length (ignored if 'paired=FALSE')

maxFragLength

Maximum fragment length (ignored if 'paired=FALSE')

keepSeqLvls

An optional vector of seqLevels (i.e. chromosomes) to include.

splitByChr

Whether to process chromosomes separately, and if so by how many chunks. The should not affect the output, and is simply slightly slower and consumes less memory. Can be a logical value (in which case each chromosome is processed separately), but we instead recommend giving a positive integer indicating the number of chunks.

pseudocount

The count to be added before fold-enrichment calculation between signals. Ignored if 'compType="subtract"'.

localBackground

A (vector of) number of windows around each tile/position for which a local background will be calculated. The background used will be the maximum of the one at the given position or of the mean of each of the local backgrounds.

only

An optional GRanges of regions for which overlapping reads should be included. If set, all other reads are discarded.

zeroCap

Logical; whether to cap values below zero to zero when producing relative signals.

forceSeqlevelsStyle

If specified, forces the use of the specified seqlevel style for the output bigwig. Can take any value accepted by 'seqlevelsStyle'.

verbose

Logical; whether to print progress messages

binSummarization

The method to summarize nucleotides into each bin, either "max" (default), "min" or "mean".

...

Passed to 'ScanBamParam'

Details

* For single-end ChIPseq data, extend reads to the expected fragment size using the 'extend' argument. * The implementation involves reading the reads before processing, which can be quite memory-hungry. To avoid this the files are read by chunks (composed of chromosomes of balanced sizes), controlled by the 'splitByChr' argument. This reduces memory usage but also creates overhead which reduces the speed. In contexts where the file is small or memory isn't a problem, using 'splitByChr=FALSE' will achieve the best speed. Otherwise, increasing 'splitByChr' will decrease the memory footprint. * Consider restricting the read quality using 'includeDuplicates=FALSE' and 'minMapq=20' for example.

Value

The bigwig filepath. Alternatively, if 'output_bw=NA_character_', the coverage data is not written to file but returned.

Author(s)

Pierre-Luc Germain

Examples

# get an example bam file
bam <- system.file("extdata", "ex1.bam", package="Rsamtools")
# create bigwig
bam2bw(bam, "output.bw")

ETHZ-INS/epiwraps documentation built on Oct. 27, 2024, 8:02 p.m.