signal2bins | R Documentation |
This function summarizes a genomic signal (variable) split into
bins (intervals). The signal must be provided in the metacolumn of a
GRanges-class
object.
signal2bins(
signal,
regions,
stat = "mean",
nbins = 20L,
nbinsUP = 20L,
nbinsDown = 20L,
streamUp = NULL,
streamDown = NULL,
absolute = FALSE,
na.rm = TRUE,
missings = 0,
region.size = 200,
num.cores = 1L,
tasks = 0L,
verbose = TRUE,
...
)
signal |
Preferibly a single GRanges object with genomic signals in
the meta-columns (each colum carrying a signal) or a list of GRanges
objects, each GRanges carrying a signal in the meta-column. For example,
methylation levels, any variable regularly measuring some genomic
magnitude. This GRanges object can be created by using function
|
regions |
A GRanges carrying the genomic region where a summarized statistic can be computed. For example, annotated gene coordinates. |
stat |
Statistic used to estimate the summarized value of the variable of interest in each interval/window. Posible options are: 'mean', geometric mean ('gmean'), 'median', 'density', 'count' and 'sum' (default). Here, we define 'density' as the sum of values from the variable of interest in the given region devided by the length/width of the region. The option 'count' compute the number/count of positions in the specified regions with values greater than zero in the selected 'column'. |
nbins, nbinsUP, nbinsDown |
An integer denoting the number of bins used to split the regions, upstream the main regions, and downstream the main regions, respectively. |
streamUp, streamDown |
An interger denonting how many base-pairs up- and down-stream the provided regions must be include in the computation. Default is NULLL. |
absolute |
Optional. Logic (default: FALSE). Whether to use the absolute values of the variable provided. For example, the difference of methylation levels could take negative values (TV) and we would be interested on the sum of abs(TV), which is sum of the total variation distance. |
na.rm |
Logical value. If TRUE, the NA values will be removed |
missings |
Whether to write '0' or 'NA' on regions where there is not data to compute the statistic. |
region.size |
An integer. The minimun size of a region to be included in the computation. Default 300 (bp). |
num.cores, tasks |
Paramaters for parallele computation using package
|
verbose |
Logical. Default is TRUE. If TRUE, then the progress of the computational tasks is given. |
... |
Argumetns to pass to |
This function is useful, for example, to get the profile of the metylation signal around genes regions: gene-body plus 2kb upstream of the TSS and 2kb downtream of the TES. The intensity of the signal profile would vary depending on the sample conditions. If a given treatment has an effect on methylation then the intesity of the signal profile for the treatment would go over or below the control samples.
A data.frame object carrying the bin coordinates: binCoord and, for each sample, the signal summarized in the requested statistic: statSumary. Notice that the bin coordinates are relative to original coordinates given in the GR objeect. For example, if the GR object carries genome-wide metylation signals (from several samples) and we are interested in to get the methylation signal profile around the genes regions, then we must provide the gene annotated coordinates in the argument regions, and set up the amount of bp upstream of TSS and dowstream of TES, say, streamUp = 2000 and streamDown = 2000, repectively. Next, if we set nbins = 20L, nbinsUP = 20L, nbinsDown = 20L, then the first and the last 20 bins of the returned signal profile represent 2000 bp each of them. Since gene-body sizes vary genome-wide, there is not a specific number of bp represented by the 20 bins covering the gene-body regions.
Robersy Sanchez. https://genomaths.com
A faster version: signals2bins
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.