import_bam | R Documentation |
Import single-end or paired-end bam files as GRanges objects, with various processing options. It is highly recommend to index the BAM file first.
import_bam(
file,
mapq = 20L,
revcomp = FALSE,
shift = 0L,
trim.to = c("whole", "5p", "3p", "center"),
ignore.strand = FALSE,
field = "score",
paired_end = NULL,
yieldSize = NA,
ncores = 1L
)
import_bam_PROseq(
file,
mapq = 20L,
revcomp = TRUE,
shift = -1L,
trim.to = "3p",
ignore.strand = FALSE,
field = "score",
paired_end = NULL,
yieldSize = NA,
ncores = 1L
)
import_bam_PROcap(
file,
mapq = 20L,
revcomp = FALSE,
shift = 0L,
trim.to = "5p",
ignore.strand = FALSE,
field = "score",
paired_end = NULL,
yieldSize = NA,
ncores = 1L
)
import_bam_ATACseq(
file,
mapq = 20L,
revcomp = FALSE,
shift = 0L,
plus_offset = 4,
minus_offset = -4,
trim.to = "5p",
ignore.strand = TRUE,
field = "score",
paired_end = TRUE,
yieldSize = NA,
ncores = 1L
)
file |
Path of a bam file, or a vector of paths. |
mapq |
Filter reads by a minimum MAPQ score. This is the correct way to filter multi-aligners. |
revcomp |
Logical indicating if aligned reads should be reverse-complemented. |
shift |
Either an integer giving the number of bases by which to shift
the entire read upstream or downstream, or a pair of integers indicating
shifts to be applied to the 5' and 3' ends of the reads, respectively.
Shifting is strand-specific, with negative numbers shifting the reads
upstream, and positive numbers shiftem them downstream. This option is
applied after the |
trim.to |
Option for selecting specific bases from the reads, applied
after the |
ignore.strand |
Logical indicating if the strand information should be
discarded. If |
field |
Metadata field name to use for readcounts, usually "score". If
set to |
paired_end |
Logical indicating if reads should be treated as paired end
reads. When set to |
yieldSize |
The number of bam file records to process simultaneously,
e.g. the "chunk size". Setting a higher chunk size will use more memory,
which can increase speed if there is enough memory available. If chunking
is not necessary, set to |
ncores |
Number of cores to use for importing bam files. Currently, multicore is only implemented for simultaneously importing multiple bam files. For smaller datasets or machines with higher memory, this can increase performance, but can otherwise lead to substantial performance penalties. |
plus_offset |
For importing ATAC-seq, the shift to apply to plus strand alignments. By default, plus strand reads are shifted 4 bp downstream. |
minus_offset |
For importing ATAC-seq, the shift to apply to minus strand alignments. By default, minus strand reads are shifted 4 bp upstream (in terms of the genomic coordinates). |
A GRanges object.
By default, import_bam_ATACseq
will
shift plus-aligned reads downstream 4 bp, minus-aligned reads upstream 4 bp,
and then take the strand-specific start site of the reads before removing
strand information and collapsing identical reads. These steps account for
the 9bp gap between opposing fragments generated from the same Tn5 reaction,
selecting the central base in the 9bp duplication.
While other sources often state that the offset should be +4 on plus strand and -5 on minus strand alignments (or alternatively +5, -4), this does not result in the two positions overlapping. I have verified that this is true based on the expected result of the Tn5 reaction and adapter ligation and sequencing steps, and also using real sequencing data, which confirms that only the +4/-4 shift results in a significant increase in the number positions that overlap. However, these arguments are left to the user if they insist on doing it differently.
Note that the order of operations performed is the same as the order of the
associated arguments in the function proper, but not in the argument
documentation i.e., the plus_offset
and minus_offset
arguments
are applied after the shift
argument and before the
trim.to
argument.
While this single-base precision analysis of ATAC-seq may be useful in some
cases, for most users it is unlikely to be useful. Instead, you might use the
plus_offset
and minus_offset
arguments correctly, but set
trim.to = "whole"
(and keep ignore.strand = TRUE
). This will
keep the entire ATAC-seq reads, which is the most common analysis approach.
It is also common to use coverage data with ATAC-seq, but this eliminates
read count information.
Mike DeBerardine
Hojoong Kwak, Nicholas J. Fuda, Leighton J. Core, John T. Lis (2013). Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing. Science 339(6122): 950–953. https://doi.org/10.1126/science.1229386
Jason D. Buenrostro, Paul G. Giresi, Lisa C. Zaba, Howard Y. Chang, William J. Greenleaf (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, dna-binding proteins and nucleosome position. Nature Methods 10: 1213–1218. https://doi.org/10.1038/nmeth.2688
# get local address for included bam file
ps.bam <- system.file("extdata", "PROseq_dm6_chr4.bam",
package = "BRGenomics")
#--------------------------------------------------#
# Import entire reads
#--------------------------------------------------#
# Note that PRO-seq reads are sequenced as reverse complement
import_bam(ps.bam, revcomp = TRUE, paired_end = FALSE)
#--------------------------------------------------#
# Import entire reads, 1 range per read
#--------------------------------------------------#
import_bam(ps.bam, revcomp = TRUE, field = NULL,
paired_end = FALSE)
#--------------------------------------------------#
# Import PRO-seq reads at basepair-resolution
#--------------------------------------------------#
# the typical manner to import PRO-seq data:
import_bam(ps.bam, revcomp = TRUE, trim.to = "3p",
paired_end = FALSE)
#--------------------------------------------------#
# Import PRO-seq reads, removing the run-on base
#--------------------------------------------------#
# the best way to import PRO-seq data; removes the
# most 3' base, which was added in the run-on
import_bam(ps.bam, revcomp = TRUE, trim.to = "3p",
shift = -1, paired_end = FALSE)
#--------------------------------------------------#
# Import 5' ends of PRO-seq reads
#--------------------------------------------------#
# will include bona fide TSSes as well as hydrolysis products
import_bam(ps.bam, revcomp = TRUE, trim.to = "5p",
paired_end = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.