regionoverlap: Function to compute overlap of genomic regions
In Ringo: R Investigation of ChIP-chip Oligoarrays

Description Usage Arguments Value Note Author(s) See Also Examples

Given two data frames of genomic regions, this function computes the base-pair overlap, if any, between every pair of regions from the two lists.

1 2	regionOverlap(xdf, ydf, chrColumn = "chr", startColumn = "start", endColumn = "end", mem.limit=1e8)

`xdf`	`data.frame` that holds the first set of genomic regions
`ydf`	`data.frame` that holds the first set of genomic regions
`chrColumn`	character; what is the name of the column that holds the chromosome name of the regions in `xdf` and `ydf`
`startColumn`	character; what is the name of the column that holds the start position of the regions in `xdf` and `ydf`
`endColumn`	character; what is the name of the column that holds the start position of the regions in `xdf` and `ydf`
`mem.limit`	integer value; what is the maximal allowed size of matrices during the computation

Originally, a matrix with nrow(xdf) rows and nrow(ydf) columns, in which entry X[i,j] specifies the length of the overlap between region i of the first list (xdf) and region j of the second list (ydf). Since this matrix is very sparse, we use the dgCMatrix representation from the Matrix package for it.

The function only return the absolute length of overlapping regions in base-pairs. It does not return the position of the overlap or the fraction of region 1 and/or region 2 that overlaps the other regions.

The argument mem.limit is not really a limit to used RAM, but rather the maximal size of matrices that should be allowed during the computation. If larger matrices would arise, the second regions list is split into parts and the overlap with the first list is computed for each part. During computation, matrices of size nrow(xdf) times nrow(ydf) are created.

Joern Toedling

dgCMatrix-class

  ## toy example:
  regionsH3ac <- data.frame(chr=c("chr1","chr7","chr8","chr1","chrX","chr8"), start=c(100,100,100,510,100,60), end=c(200, 200, 200,520,200,80))
  regionsH4ac <- data.frame(chr=c("chr1","chr2","chr7","chr8","chr9"),
start=c(500,100,50,80,100), end=c(700, 200, 250, 120,200))

  ## compare the regions first by eye
  ##  which ones do overlap and by what amount?
  regionsH3ac
  regionsH4ac

  ## compare it to the result:
  as.matrix(regionOverlap(regionsH3ac, regionsH4ac))
  nonzero(regionOverlap(regionsH3ac, regionsH4ac))

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: RColorBrewer
Loading required package: limma

Attaching package: ‘limma’

The following object is masked from ‘package:BiocGenerics’:

    plotMA

Loading required package: Matrix
Loading required package: grid
Loading required package: lattice
   chr start end
1 chr1   100 200
2 chr7   100 200
3 chr8   100 200
4 chr1   510 520
5 chrX   100 200
6 chr8    60  80
   chr start end
1 chr1   500 700
2 chr2   100 200
3 chr7    50 250
4 chr8    80 120
5 chr9   100 200
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    0    0    0    0
[2,]    0    0  101    0    0
[3,]    0    0    0   21    0
[4,]   11    0    0    0    0
[5,]    0    0    0    0    0
[6,]    0    0    0    1    0
     row col
[1,]   4   1
[2,]   2   3
[3,]   3   4
[4,]   6   4