signals2bins: Genomic Signals to Summarized Bins

View source: R/signals2bins.R

signals2binsR Documentation

Genomic Signals to Summarized Bins

Description

This function summarizes a genomic signal (variable) split into bins (intervals). The signal must be provided in the metacolumn of a GRanges-class object.

Usage

signals2bins(
  signal,
  regions,
  stat = "mean",
  nbins = 20L,
  nbinsUP = 20L,
  nbinsDown = 20L,
  streamUp = NULL,
  streamDown = NULL,
  absolute = FALSE,
  na.rm = TRUE,
  missings = 0,
  region.size = 300,
  scaling = 1000L,
  verbose = TRUE,
  ...
)

Arguments

signal

Preferably a single GRanges object with genomic signals in the meta-columns (each column carrying a signal) or a list of GRanges objects, each GRanges carrying a signal in the meta-column. For example, methylation levels, any variable regularly measuring some genomic magnitude. This GRanges object can be created by using function uniqueGRanges from MethylIT R package.

regions

A GRanges carrying the genomic region where a summarized statistic can be computed. For example, annotated gene coordinates.

stat

Statistic used to estimate the summarized value of the variable of interest in each interval/window. Possible options are: 'mean', geometric mean ('gmean'), 'median', 'density', 'count' and 'sum' (default). Here, we define 'density' as the sum of values from the variable of interest in the given region divided by the length/width of the region. The option 'count' compute the number/count of positions in the specified regions with values greater than zero in the selected 'column'.

nbins, nbinsUP, nbinsDown

An integer denoting the number of bins used to split the regions, upstream the main regions, and downstream the main regions, respectively.

streamUp, streamDown

An interger denonting how many base-pairs up- and down-stream the provided regions must be include in the computation. Default is NULLL.

absolute

Optional. Logic (default: FALSE). Whether to use the absolute values of the variable provided. For example, the difference of methylation levels could take negative values (TV) and we would be interested on the sum of abs(TV), which is sum of the total variation distance.

na.rm

Logical value. If TRUE, the NA values will be removed

missings

Whether to write '0' or 'NA' on regions where there is not data to compute the statistic.

region.size

An integer. The minimum size of a region to be included in the computation. Default 300 (bp).

verbose

Logical. Default is TRUE. If TRUE, then the progress of the computational tasks is given.

...

Arguments to pass to findOverlaps-methods function.

Details

This function is useful, for example, to get the profile of the metylation signal around genes regions: gene-body plus 2kb upstream of the TSS and 2kb downstream of the TES. The intensity of the signal profile would vary depending on the sample conditions. If a given treatment has an effect on methylation then the intensity of the signal profile for the treatment would go over or below the control samples.

This function does the same as function signal2bins, except for that it is significantly faster than signal2bins function and small variation on the signal profiles. These variations came from the way to split the regions into bins, for which there is not an exact algorithm to perform it. Function signal2bins uses cut, while current function uses tile function (IPosRanges-class).

Value

A data.frame object carrying the bin coordinates: binCoord and, for each sample, the signal summarized in the requested statistic: statSumary. Notice that the bin coordinates are relative to original coordinates given in the GR object. For example, if the GR object carries genome-wide metylation signals (from several samples) and we are interested in to get the methylation signal profile around the genes regions, then we must provide the gene annotated coordinates in the argument regions, and set up the amount of bp upstream of TSS and downstream of TES, say, streamUp = 2000 and streamDown = 2000, repectively. Next, if we set nbins = 20L, nbinsUP = 20L, nbinsDown = 20L, then the first and the last 20 bins of the returned signal profile represent 2000 bp each of them. Since gene-body sizes vary genome-wide, there is not a specific number of bp represented by the 20 bins covering the gene-body regions.

Author(s)

Robersy Sanchez. https://genomaths.com

See Also

signal2bins.


genomaths/MethylIT.utils documentation built on July 4, 2023, 12:05 a.m.