tallyVariants | R Documentation |
Tallies the bases, qualities and read positions for every genomic
position in a BAM file. By default, this only returns the positions
for which an alternate base has been detected. The typical usage is
to pass a BAM file, the genome, the (fixed) readlen
and (if the
variant calling should consider quality) an appropriate
high_base_quality
cutoff.
Passing a which
argument allows computing on only a
subregion of the genome. which
is a ‘RangesList’ or
something coercible to one that limits the tally to that range or
set of ranges. By default, the entire genome is processed.
For parallel evaluation (see BPPARAM
): Specifically,
which
can be a ‘GenomicRanges’ or a ‘GRangesList’. If
which
is a ‘GenomicRanges’ and has length 1 it is tiled
to create chunks for parallel evaluation. If it is longer
than 1, each range becomes a chunk for parallel evaluation.
If which
is a ‘GRangesList’, each element (i.e. each
‘GenomicRanges’) becomes a chunk. The latter can be useful to
ensure balanced worker load, e.g. in the case of regions covering
multiple sequences(see equisplit
).
## S4 method for signature 'BamFile'
tallyVariants(x, param = TallyVariantsParam(...), ...,
BPPARAM = defaultBPPARAM())
## S4 method for signature 'BamFileList'
tallyVariants(x, ...)
## S4 method for signature 'character'
tallyVariants(x, ...)
TallyVariantsParam(genome,
read_pos_breaks = NULL,
high_base_quality = 0L,
minimum_mapq = 13L,
variant_strand = 1L, ignore_query_Ns = TRUE,
ignore_duplicates = TRUE,
mask = GRanges(), keep_extra_stats = TRUE,
read_length = NA_integer_,
read_pos = !is.null(read_pos_breaks),
high_nm_score = NA_integer_,
...)
x |
An indexed BAM file, either a path, |
param |
The parameters for the tallying process, as a
|
... |
For |
genome |
The genome, either a |
read_pos_breaks |
The breaks used for tabulating the read positions (read
positions) at each position. If this information is included (not
|
high_base_quality |
The minimum cutoff for whether a base is
counted as high quality. By default, |
minimum_mapq |
Minimum MAPQ of a read for it to be included in
the tallies. This depend on the aligner; the default is reasonable
for |
variant_strand |
On how many strands must an alternate base be detected for a position to be returned. Highly recommended to set this to at least 1 (otherwise, the result is huge and includes many uninteresting reference rows). |
ignore_query_Ns |
Whether to ignore N calls in the
reads. Usually, there is no reason to set this to |
ignore_duplicates |
whether to ignore reads flagged as PCR/optical duplicates |
mask |
A |
read_length |
The expected read length, used for calculating the “median distance from nearest” end statistic. If not specified, an attempt is made to guess the read length from a random sample of the BAM file. If read length is found to be variable, statistics depending on the read length are not calculated. |
read_pos |
Whether to tally read positions, which can be computationally intensive. |
high_nm_score |
If not |
keep_extra_stats |
Whether to keep various summary statistics generated from the tallies; setting this to FALSE will save memory. The extra statistics are most useful for algorithm diagnostics and development. |
BPPARAM |
A
|
For tallyVariants
, the tally GRanges
.
For TallyVariantsParam
, an object with parameters suitable for
variant calling.
The VariantTallyParam
constructor is DEPRECATED.
Michael Lawrence, Jeremiah Degenhardt
if (requireNamespace("gmapR")) {
tally.param <- TallyVariantsParam(gmapR::TP53Genome(),
high_base_quality = 23L,
which = gmapR::TP53Which())
bams <- LungCancerLines::LungCancerBamFiles()
raw.variants <- tallyVariants(bams$H1993, tally.param)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.