View source: R/roi_functions.R
intersectByGene | R Documentation |
These functions divide up regions of interest according to associated names,
and perform an inter-range operation on them. intersectByGene
returns
the "consensus" segment that is common to all input ranges, and returns no
more than one range per gene. reduceByGene
collapses the input ranges
into one or more non-overlapping ranges that encompass all segments from the
input ranges.
intersectByGene(regions.gr, gene_names)
reduceByGene(regions.gr, gene_names, disjoin = FALSE)
regions.gr |
A GRanges object containing regions of interest. If
|
gene_names |
A character vector with the same length as
|
disjoin |
Logical. If |
These functions modify regions of interest that have associated names, such that several ranges share the same name, e.g. transcripts with associated gene names. Both functions "combine" the ranges on a gene-by-gene basis.
intersectByGene
For each unique gene, the segment that overlaps all input ranges is returned. If no single range can be constructed that overlaps all input ranges, no range is returned for that gene (i.e. the gene is effectively filtered).
In other words, for all the ranges associated with a gene, the most-downstream start site is selected, and the most upstream end site is selected.
reduceByGene
For each unique gene, the associated ranges are
reduced
to produce one or
more non-overlapping ranges. The output range(s) are effectively a
union
of the input ranges, and cover every input base.
With disjoin = FALSE
, no genomic segment is overlapped by more than
one range of the same gene, but ranges from different genes can
overlap. With disjoin = TRUE
, the output ranges are disjoint, and no
genomic position is overlapped more than once. Any segment that overlaps
more than one gene is removed, but any segment (i.e. any section of an
input range) that overlaps only one gene is still maintained.
A GRanges object whose individual ranges are named for the associated gene.
A typical use for intersectByGene
is to avoid transcript isoform
selection, as the returned range is found in every isoform.
reduceByGene
can be used to count any and all reads that overlap any
part of a gene's annotation, but without double-counting any of them. With
disjoin = FALSE
, no reads will be double-counted for the same gene,
but the same read can be counted for multiple genes. With disjoin =
TRUE
, no read can be double-counted.
Mike DeBerardine
# Make example data:
# Ranges 1 and 2 overlap,
# Ranges 3 and 4 are adjacent
gr <- GRanges(seqnames = "chr1",
ranges = IRanges(start = c(1, 3, 7, 10),
end = c(4, 5, 9, 11)))
gr
#--------------------------------------------------#
# intersectByGene
#--------------------------------------------------#
intersectByGene(gr, c("A", "A", "B", "B"))
intersectByGene(gr, c("A", "A", "B", "C"))
gr2 <- gr
end(gr2)[1] <- 10
gr2
intersectByGene(gr2, c("A", "A", "B", "C"))
intersectByGene(gr2, c("A", "A", "A", "C"))
#--------------------------------------------------#
# reduceByGene
#--------------------------------------------------#
# For a given gene, overlapping/adjacent ranges are combined;
# gaps result in multiple ranges for that gene
gr
reduceByGene(gr, c("A", "A", "A", "A"))
# With disjoin = FALSE, ranges from different genes can overlap
gnames <- c("A", "B", "B", "B")
reduceByGene(gr, gnames)
# With disjoin = TRUE, segments overlapping >1 gene are removed as well
reduceByGene(gr, gnames, disjoin = TRUE)
# Will use one more example to demonstrate how all
# unambiguous segments are identified and returned
gr2
gnames
reduceByGene(gr2, gnames, disjoin = TRUE)
#--------------------------------------------------#
# reduceByGene, then aggregate counts by gene
#--------------------------------------------------#
# Consider if you did getCountsByRegions on the last output,
# you can aggregate those counts according to the genes
gr2_redux <- reduceByGene(gr2, gnames, disjoin = TRUE)
counts <- c(5, 2, 3) # if these were the counts-by-regions
aggregate(counts ~ names(gr2_redux), FUN = sum)
# even more convenient if using a melted dataframe
df <- data.frame(gene = names(gr2_redux),
reads = counts)
aggregate(reads ~ gene, df, FUN = sum)
# can be extended to multiple samples
df <- rbind(df, df)
df$sample <- rep(c("s1", "s2"), each = 3)
df$reads[4:6] <- c(3, 1, 2)
df
aggregate(reads ~ sample*gene, df, FUN = sum)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.