Description Usage Arguments Details Value Author(s) Examples
Extract subset of sites in a set of intervals
1 | extract_sites(start, end, site, return_index = FALSE, min_sites = 0)
|
start |
start positions, a numeric vector |
end |
end positions, a numeric vector. |
site |
positions of all sites, should be sorted increasingly. |
return_index |
whether return the index in the position vector or just the position itself? |
min_sites |
minimal number of sites in an interval, regions which contain sites less than this value will be filtered out. |
Providing a huge vector of genomic positions, we want to extract subset of positions which locate in a specific group of regions (e.g. extract CpG sites in DMRs). Normally, we will use:
1 2 3 4 |
Unfortunately, in above code, the whole vector site
will be scanned four times
(>=
, <=
, &
and [
).
If you want to look for sites in more than one regions (e.g. 1000 regions), in every
loop, the whole site
vector will be re-scanned again and again which is very time-consuming.
Here we have extract_sites
function which uses binary search to do subsetting.
Of course, site
should be sorted non-decreasing beforehand.
1 | subsite = extract_sites(start, end, site, index = FALSE)
|
Not only for single interval, you can also extract sites in multiple genomic regins,
by setting start
and end
as vectors.
1 2 3 |
You can choose to return index only or positions.
1 2 3 4 5 |
Regions that include sites less than min_site
will be filtered out.
A vector of positions or index.
Zuguang Gu <z.gu@dkfz.de>
1 2 3 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.