qProfile: Quantify alignments by relative position
In fmicompbio/QuasR: Quantify and Annotate Short Reads in R

View source: R/qProfile.R

qProfile

R Documentation

Quantify alignments by relative position

Description

Quantify alignments from sequencing data, relative to their position in query regions.

Usage

qProfile(
  proj,
  query,
  upstream = 1000,
  downstream = upstream,
  selectReadPosition = c("start", "end"),
  shift = 0L,
  orientation = c("any", "same", "opposite"),
  useRead = c("any", "first", "last"),
  auxiliaryName = NULL,
  mask = NULL,
  collapseBySample = TRUE,
  includeSpliced = TRUE,
  includeSecondary = TRUE,
  mapqMin = 0L,
  mapqMax = 255L,
  absIsizeMin = NULL,
  absIsizeMax = NULL,
  maxInsertSize = 500L,
  binSize = 1L,
  clObj = NULL
)

Arguments

`proj`	A `qProject` object representing a sequencing experiment as returned by `qAlign`
`query`	An object of type `GRanges` with the regions to be profiled. All regions in `query` will be anchored at their biological start position (`start(query)` for regions on strand “+” or “*”, `end(query)` for regions on strand “-”). This position will become position zero in the return value.
`upstream`	An “integer” vector of length one or the same length as `query` indicating the number of bases upstream of the anchor position to include in the profile.
`downstream`	An “integer” vector of length one or the same length as `query` indicating the number of bases downstream of the anchor position to include in the profile.
`selectReadPosition`	defines the part of the alignment that has to be contained within a query region to produce an overlap (see Details), and that is used to calculate the relative position within the query region. Possible values are: `start` (default) : start of the alignment `end` : end of the alignment
`shift`	controls the shifting alignments towards their 3'-end before quantification. `shift` can be one of: an “integer” vector of the same length as the number of alignment files a single “integer” value the character string `"halfInsert"` (only available for paired-end experiments) The default of `0` will not shift any alignments.
`orientation`	sets the required orientation of the alignments relative to the query region in order to be counted, one of: `any` (default) : count alignment on the same and opposite strand `same` : count only alignment on the same strand `opposite` : count only alignment on the opposite strand
`useRead`	For paired-end experiments, selects the read mate whose alignments should be counted, one of: `any` (default) : count all alignments `first` : count only alignments from the first read `last` : count only alignments from the last read
`auxiliaryName`	Which bam files to use in an experiments with auxiliary alignments (see Details).
`mask`	If not `NULL`, a `GRanges` object with reference regions to be masked, i.e. excluded from the quantification, such as unmappable or highly repetitive regions (see Details).
`collapseBySample`	If `TRUE` (the default), sum alignment counts from bam files with the same sample name.
`includeSpliced`	If `TRUE` (the default), include spliced alignments when counting. A spliced alignment is defined as an alignment with a gap in the read of at least 60 bases.
`includeSecondary`	If `TRUE` (the default), include alignments with the secondary bit (0x0100) set in the `FLAG` when counting.
`mapqMin`	Minimal mapping quality of alignments to be included when counting (mapping quality must be greater than or equal to `mapqMin`). Valid values are between 0 and 255. The default (0) will include all alignments.
`mapqMax`	Maximal mapping quality of alignments to be included when counting (mapping quality must be less than or equal to `mapqMax`). Valid values are between 0 and 255. The default (255) will include all alignments.
`absIsizeMin`	For paired-end experiments, minimal absolute insert size (TLEN field in SAM Spec v1.4) of alignments to be included when counting. Valid values are greater than 0 or `NULL` (default), which will not apply any minimum insert size filtering.
`absIsizeMax`	For paired-end experiments, maximal absolute insert size (TLEN field in SAM Spec v1.4) of alignments to be included when counting. Valid values are greater than 0 or `NULL` (default), which will not apply any maximum insert size filtering.
`maxInsertSize`	Maximal fragment size of the paired-end experiment. This parameter is used if `shift="halfInsert"` and will ensure that query regions are made wide enough to emcompass all alignment pairs whose mid falls into the query region. The default value is `500` bases.
`binSize`	Numeric scalar giving the size of bins (must be an odd number). The default value (`1`) gives back counts for single bases. Otherwise, alignments are counted in adjacent, non-overlapping windows of size `binSize` that tile the interval defined by `upstream` and `downstream`.
`clObj`	A cluster object to be used for parallel processing (see ‘Details’).

Details

qProfile is used to count alignments in each sample from a qProject object, relative to their position in query regions.

Most arguments are identical to the ones of qCount.

The query argument is a GRanges object that defines the regions for the profile. All regions in query will be aligned to one another at their anchor position, which corresponds to their biological start position (start(query) for regions on strand “+” or “*”, end(query) for regions on strand “-”).

This anchor position will be extended (with regard to strand) by the number of bases specified by upstream and downstream. In the return value, the anchor position will be at position zero.

If binSize is greater than one, upstream and downstream will be slightly increased in order to include the complete first and last bins of binSize bases.

Regions with identical names in names{query} will be summed, and profiles will be padded with zeros to accomodate the length of all profiles.

Value

A list of matrices with length(unique(names(query))) rows with profile names, and max(upstream)+max(downstream)+1 columns indicating relative position (for binsize=1).

For binSize values greater than 1, the number of columns corresponds to the number of bins (tiles), namely ceiling(max(upstream)/binSize)+ceiling(max(downstream)/binSize). A middle bin of size binSize is always positioned centered at the anchor of each region. Additional bins are positioned upstream and downstream, adjacent to that middle bin, in order to include at least upstream and downstream bases, respectively (potentially more in order to fill the first and last bins).

The relative positions are given as column names (for binSize > 1 they refer to the bin mid). In that case, the bins are "right-open". For example, if binSize = 10, the bin with the midpoint "-50" contains counts for the alignments in [-55,-45).

The first list element is called “coverage” and contains, for each profile and relative position, the number of overlapping regions that contributed to the profile.

Subsequent list elements contain the alignment counts for individual sequence files (collapseBySample=FALSE) or samples (collapseBySample=TRUE) in proj.

For projects with allele-specific quantification, i.e. if a file with single nucleotide polymorphisms was supplied to the snpFile argument of qAlign, there will be three rows instead of one row with counts per unique region name, with numbers of alignments for Reference, Unknown and Alternative genotypes (suffixed _R, _U and _A).

Author(s)

Anita Lerch, Dimos Gaidatzis and Michael Stadler

Examples

# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)

# create alignments (single-end experiment)
genomeFile <- "extdata/hg19sub.fa"
sampleFile <- "extdata/samples_chip_single.txt"
proj <- qAlign(sampleFile, genomeFile)

# load transcript start site coordinates
library(rtracklayer)
annotationFile <- "extdata/hg19sub_annotation.gtf"
tssRegions <- import.gff(annotationFile, format="gtf",
                         feature.type="start_codon")

# obtain a combined TSS profile
pr1 <- qProfile(proj, tssRegions)
lapply(pr1, dim)
lapply(pr1, "[", , 1:5)

prComb <- do.call("+", lapply(pr1[-1], function(x) x/pr1[[1]]))
barplot(prComb, xlab="Position", ylab="Mean no. of alignments")

# obtain TSS profiles for individual regions
names(tssRegions) <- mcols(tssRegions)$transcript_id
pr2 <- qProfile(proj, tssRegions)
lapply(pr2, dim)
lapply(pr2, "[", 1:3, 1:5)

fmicompbio/QuasR documentation built on Feb. 25, 2025, 7:30 a.m.