getPausingIndices: Calculate pausing indices from user-supplied promoters &...

View source: R/signal_counting.R

getPausingIndicesR Documentation

Calculate pausing indices from user-supplied promoters & genebodies

Description

Pausing index (PI) is calculated for each gene (within matched promoters.gr and genebodies.gr) as promoter-proximal (or pause region) signal counts divided by genebody signal counts. If length.normalize = TRUE (recommended), the signal counts within each range in promoters.gr and genebodies.gr are divided by their respective range widths (region lengths) before pausing indices are calculated.

Usage

getPausingIndices(
  dataset.gr,
  promoters.gr,
  genebodies.gr,
  field = "score",
  length.normalize = TRUE,
  remove.empty = FALSE,
  blacklist = NULL,
  melt = FALSE,
  region_names = NULL,
  expand_ranges = FALSE,
  ncores = getOption("mc.cores", 2L)
)

Arguments

dataset.gr

A GRanges object in which signal is contained in metadata (typically in the "score" field), or a named list of such GRanges objects.

promoters.gr

A GRanges object containing promoter-proximal regions of interest.

genebodies.gr

A GRanges object containing genebody regions of interest.

field

The metadata field of dataset.gr to be counted. If length(field) > 1, a dataframe is returned containing the pausing indices for each region in each field. If field not found in names(mcols(dataset.gr)), will default to using all fields found in dataset.gr. If dataset.gr is a list, a single field should be given, or length(field) should be the equal to the number of datasets in dataset.gr.

length.normalize

A logical indicating if signal counts within regions of interest should be length normalized. The default is TRUE, which is recommended, especially if input regions don't all have the same width.

remove.empty

A logical indicating if genes without any signal in promoters.gr should be removed. No genes are filtered by default. If dataset.gr is a list of datasets, or if length(field) > 1, regions are filtered unless they have promoter signal in all datasets.

blacklist

An optional GRanges object containing regions that should be excluded from signal counting. If length.normalize = TRUE, blacklisted positions will be excluded from length calculations. Users should take care to note if regions of interest substantially overlap blacklisted positions.

melt

If melt = TRUE, a dataframe is returned containing a column for regions and another column for pausing indices. If multiple datasets are given (if dataset.gr is a list or if length(field) > 1), the output dataframe is melted to contain a third column indicating the sample names. (See section on return values below).

region_names

If melt = TRUE, an optional vector of names for the regions in regions.gr. If left as NULL, indices of regions.gr are used instead.

expand_ranges

Logical indicating if ranges in dataset.gr should be treated as descriptions of single molecules (FALSE), or if ranges should be treated as representing multiple adjacent positions with the same signal (TRUE). See getCountsByRegions.

ncores

Multiple cores will only be used if dataset.gr is a list of multiple datasets, or if length(field) > 1.

Value

A vector parallel to the input genelist, unless remove.empty = TRUE, in which case the vector may be shorter. If dataset.gr is a list, or if length(field) > 1, a dataframe is returned, containing a column for each field. However, if melt = TRUE, dataframes contain one column to indicate regions (either by their indices, or by region_names, if given), another column to indicate signal, and a third column containing the sample name (unless dataset.gr is a single GRanges object).

Author(s)

Mike DeBerardine

See Also

getCountsByRegions

Examples

data("PROseq") # load included PROseq data
data("txs_dm6_chr4") # load included transcripts

#--------------------------------------------------#
# Get promoter-proximal and genebody regions
#--------------------------------------------------#

# genebodies from +300 to 300 bp before the poly-A site
gb <- genebodies(txs_dm6_chr4, 300, -300, min.window = 400)

# get the transcripts that are large enough (>1kb in size)
txs <- subset(txs_dm6_chr4, tx_name %in% gb$tx_name)

# for the same transcripts, promoter-proximal region from 0 to +100
pr <- promoters(txs, 0, 100)

#--------------------------------------------------#
# Calculate pausing indices
#--------------------------------------------------#

pidx <- getPausingIndices(PROseq, pr, gb)

length(txs)
length(pidx)
head(pidx)

#--------------------------------------------------#
# Without length normalization
#--------------------------------------------------#

head( getPausingIndices(PROseq, pr, gb, length.normalize = FALSE) )

#--------------------------------------------------#
# Removing empty means the values no longer match the genelist
#--------------------------------------------------#

pidx_signal <- getPausingIndices(PROseq, pr, gb, remove.empty = TRUE)

length(pidx_signal)

mdeber/BRGenomics documentation built on Aug. 3, 2024, 3:43 a.m.