qProfile | R Documentation |
Quantify alignments from sequencing data, relative to their position in query regions.
qProfile(
proj,
query,
upstream = 1000,
downstream = upstream,
selectReadPosition = c("start", "end"),
shift = 0L,
orientation = c("any", "same", "opposite"),
useRead = c("any", "first", "last"),
auxiliaryName = NULL,
mask = NULL,
collapseBySample = TRUE,
includeSpliced = TRUE,
includeSecondary = TRUE,
mapqMin = 0L,
mapqMax = 255L,
absIsizeMin = NULL,
absIsizeMax = NULL,
maxInsertSize = 500L,
binSize = 1L,
clObj = NULL
)
proj |
A |
query |
An object of type |
upstream |
An “integer” vector of length one or the same
length as |
downstream |
An “integer” vector of length one or the same
length as |
selectReadPosition |
defines the part of the alignment that has to be contained within a query region to produce an overlap (see Details), and that is used to calculate the relative position within the query region. Possible values are:
|
shift |
controls the shifting alignments towards their 3'-end before
quantification.
The default of |
orientation |
sets the required orientation of the alignments relative to the query region in order to be counted, one of:
|
useRead |
For paired-end experiments, selects the read mate whose alignments should be counted, one of:
|
auxiliaryName |
Which bam files to use in an experiments with auxiliary alignments (see Details). |
mask |
If not |
collapseBySample |
If |
includeSpliced |
If |
includeSecondary |
If |
mapqMin |
Minimal mapping quality of alignments to be included when
counting (mapping quality must be greater than or equal to |
mapqMax |
Maximal mapping quality of alignments to be included when
counting (mapping quality must be less than or equal to |
absIsizeMin |
For paired-end experiments, minimal absolute insert
size (TLEN field in SAM Spec v1.4) of alignments to be included when
counting. Valid values are greater than 0 or |
absIsizeMax |
For paired-end experiments, maximal absolute insert
size (TLEN field in SAM Spec v1.4) of alignments to be included when
counting. Valid values are greater than 0 or |
maxInsertSize |
Maximal fragment size of the paired-end experiment.
This parameter is used if |
binSize |
Numeric scalar giving the size of bins (must be an odd number).
The default value ( |
clObj |
A cluster object to be used for parallel processing (see ‘Details’). |
qProfile
is used to count alignments in each sample from a
qProject
object, relative to their position in query regions.
Most arguments are identical to the ones of qCount
.
The query
argument is a GRanges
object that defines the regions for the profile. All regions in
query
will be aligned to one another at their anchor position,
which corresponds to their biological start position (start(query)
for regions on strand “+” or “*”, end(query)
for
regions on strand “-”).
This anchor position will be extended (with regard to strand) by
the number of bases specified by upstream
and downstream
.
In the return value, the anchor position will be at position zero.
If binSize
is greater than one, upstream
and downstream
will be slightly increased in order to include the complete first and last
bins of binSize
bases.
Regions with identical names in names{query}
will be summed, and
profiles will be padded with zeros to accomodate the length of all profiles.
A list
of matrices with length(unique(names(query)))
rows
with profile names, and max(upstream)+max(downstream)+1
columns
indicating relative position (for binsize=1
).
For binSize
values greater than 1, the number of columns corresponds to
the number of bins (tiles), namely
ceiling(max(upstream)/binSize)+ceiling(max(downstream)/binSize)
.
A middle bin of size binSize
is always positioned centered at the anchor
of each region. Additional bins are positioned upstream and downstream, adjacent
to that middle bin, in order to include at least upstream
and
downstream
bases, respectively (potentially more in order to fill the
first and last bins).
The relative positions are given as column names (for binSize > 1
they refer to the bin mid). In that case, the bins are "right-open". For
example, if binSize = 10
, the bin with the midpoint "-50" contains
counts for the alignments in [-55,-45).
The first list element is called “coverage” and contains, for each profile and relative position, the number of overlapping regions that contributed to the profile.
Subsequent list elements contain the alignment counts for individual
sequence files (collapseBySample=FALSE
) or samples
(collapseBySample=TRUE
) in proj
.
For projects with allele-specific quantification, i.e. if a file with
single nucleotide polymorphisms was supplied to the snpFile
argument of qAlign
, there will be three rows
instead of one row with counts per unique region name, with numbers
of alignments for Reference, Unknown and Alternative genotypes
(suffixed _R, _U and _A).
Anita Lerch, Dimos Gaidatzis and Michael Stadler
qCount
,
qAlign
,
qProject
,
makeCluster
from package parallel
# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)
# create alignments (single-end experiment)
genomeFile <- "extdata/hg19sub.fa"
sampleFile <- "extdata/samples_chip_single.txt"
proj <- qAlign(sampleFile, genomeFile)
# load transcript start site coordinates
library(rtracklayer)
annotationFile <- "extdata/hg19sub_annotation.gtf"
tssRegions <- import.gff(annotationFile, format="gtf",
feature.type="start_codon")
# obtain a combined TSS profile
pr1 <- qProfile(proj, tssRegions)
lapply(pr1, dim)
lapply(pr1, "[", , 1:5)
prComb <- do.call("+", lapply(pr1[-1], function(x) x/pr1[[1]]))
barplot(prComb, xlab="Position", ylab="Mean no. of alignments")
# obtain TSS profiles for individual regions
names(tssRegions) <- mcols(tssRegions)$transcript_id
pr2 <- qProfile(proj, tssRegions)
lapply(pr2, dim)
lapply(pr2, "[", 1:3, 1:5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.