Description Usage Arguments Value Note Author(s) See Also Examples
Given two data frames of genomic regions, this function computes the base-pair overlap, if any, between every pair of regions from the two lists.
1 2 | regionOverlap(xdf, ydf, chrColumn = "chr", startColumn = "start",
endColumn = "end", mem.limit=1e8)
|
xdf |
|
ydf |
|
chrColumn |
character; what is the name of the column that holds
the chromosome name of the regions in |
startColumn |
character; what is the name of the column that holds
the start position of the regions in |
endColumn |
character; what is the name of the column that holds
the start position of the regions in |
mem.limit |
integer value; what is the maximal allowed size of matrices during the computation |
Originally, a matrix with nrow(xdf)
rows and
nrow(ydf)
columns, in which entry X[i,j]
specifies the
length of the overlap between region i
of the first list
(xdf
) and region j
of the second list (ydf
).
Since this matrix is very sparse, we use the dgCMatrix
representation from the Matrix
package for it.
The function only return the absolute length of overlapping regions in base-pairs. It does not return the position of the overlap or the fraction of region 1 and/or region 2 that overlaps the other regions.
The argument mem.limit
is not really a limit to used RAM, but
rather the maximal size of matrices that should be allowed during the
computation. If larger matrices would arise, the second regions list
is split into parts and the overlap with the first list is computed
for each part. During computation, matrices of size
nrow(xdf)
times nrow(ydf)
are created.
Joern Toedling
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## toy example:
regionsH3ac <- data.frame(chr=c("chr1","chr7","chr8","chr1","chrX","chr8"), start=c(100,100,100,510,100,60), end=c(200, 200, 200,520,200,80))
regionsH4ac <- data.frame(chr=c("chr1","chr2","chr7","chr8","chr9"),
start=c(500,100,50,80,100), end=c(700, 200, 250, 120,200))
## compare the regions first by eye
## which ones do overlap and by what amount?
regionsH3ac
regionsH4ac
## compare it to the result:
as.matrix(regionOverlap(regionsH3ac, regionsH4ac))
nonzero(regionOverlap(regionsH3ac, regionsH4ac))
|
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: RColorBrewer
Loading required package: limma
Attaching package: ‘limma’
The following object is masked from ‘package:BiocGenerics’:
plotMA
Loading required package: Matrix
Loading required package: grid
Loading required package: lattice
chr start end
1 chr1 100 200
2 chr7 100 200
3 chr8 100 200
4 chr1 510 520
5 chrX 100 200
6 chr8 60 80
chr start end
1 chr1 500 700
2 chr2 100 200
3 chr7 50 250
4 chr8 80 120
5 chr9 100 200
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 101 0 0
[3,] 0 0 0 21 0
[4,] 11 0 0 0 0
[5,] 0 0 0 0 0
[6,] 0 0 0 1 0
row col
[1,] 4 1
[2,] 2 3
[3,] 3 4
[4,] 6 4
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.