binner: UCSC bin indexing system utility functions

binning-utilsR Documentation

UCSC bin indexing system utility functions

Description

Utility functions for UCSC bin indexing system manipulation

Usage

  binFromCoordRange(starts, ends)
  binRangesFromCoordRange(start, end)
  binRestrictionString(start, end, field="bin")

Arguments

starts, ends

A vector of integers. A set of ranges.

start, end

A integer vector of length 1. A coordinate range.

field

Name of bin column. Default: "bin".

Details

The UCSC bin indexing system was initially suggested by Richard Durbin and Lincoln Stein to speed up the SELECT of a SQL query for the rows overlapping with certain genome coordinate. The system first used in UCSC genome browser is described by Kent et. al. (2002).

Value

For binFromCoordRange, it returns the bin number that should be assigned to a feature spanning the given range. Usually it is used when creating a database for the features.

For binRangesFromCoordRange, it returns the set of bin ranges that overlap a given coordinate range. It is usually used to find out the bins overlapped with a range. For SQL query, it is more convenient to use binRestrictionString than to use this function directly.

For binRestrictionString, it returns a string to be used in the WHERE section of a SQL SELECT statement that is to select features overlapping a certain range. * USE THIS WHEN QUERYING A DB *

Author(s)

Ge Tan

References

Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., & Haussler, A. D. (2002). The Human Genome Browser at UCSC. Genome Research, 12(6), 996-1006. doi:10.1101/gr.229102

http://genomewiki.ucsc.edu/index.php/Bin_indexing_system

Examples

  binFromCoordRange(starts=c(10003, 1000000), ends=c(10004, 1100000))
  binRangesFromCoordRange(start=10000, end=2000000)
  binRestrictionString(start=10000, end=2000000, field="bin")

ge11232002/CNEr documentation built on Oct. 26, 2022, 7:08 p.m.