propTest: Beta Regression for methylation levels and rates

View source: R/propTest.R

propTestR Documentation

Beta Regression for methylation levels and rates

Description

Beta Regression analysis for treatment versus control group comparison of methylation levels, appends three new metacolumns 'beta', 'log2FC', 'pvalue' to the provided GRanges argument

Usage

propTest(
  GR,
  control.names,
  treatment.names,
  link = "logit",
  type = "ML",
  tv.cut = NULL,
  indvPerGrp = 0,
  FilterLog2FC = TRUE,
  pAdjustMethod = "BH",
  pvalCutOff = 0.05,
  Minlog2FC = 0.5,
  saveAll = FALSE,
  num.cores = 1,
  tasks = 0L,
  verbose = TRUE
)

Arguments

GR

GRanges objects including control and treatment samples containing the methylation levels. The name for each column must coincide with the names given for parameters: 'control.names' and 'treatment.names'.

control.names

Names/IDs of the control samples, which must be include in the variable GR at the metacolumn.

treatment.names

Names/IDs of the treatment samples, which must be included in the variable GR at the metacolumn.

link

Parameter to be passed to function 'betareg' from package 'betareg'. character specification of the link function in the mean model (mu). Currently, 'logit', 'probit', 'cloglog', 'cauchit', 'log', 'loglog' are supported. Alternatively, an object of class 'link-glm' can be supplied (see betareg).

type

Parameter to be passed to function 'betareg' from package 'betareg'. A character specification of the type of estimator. Currently, maximum likelihood ('ML'), ML with bias correction ('BC'), and ML with bias reduction ('BR') are supported.

tv.cut

A cutoff for the total variation distance (TVD; absolute value of methylation levels differences) estimated at each site/range as the difference of the group means of methylation levels. If tv.cut is provided, then sites/ranges k with abs(TV_k) < tv.cut are removed before to perform the regression analysis. Its value must be NULL or a number 0 < tv.cut < 1.

indvPerGrp

An integer number giving the minimum number of individuals per group at each site/region. Default 2.

FilterLog2FC

if TRUE, the results are filtered using the minimun absolute value of log2FoldChanges observed to accept that a gene in the treatment is differentially expressed in respect to the control.

pAdjustMethod

method used to adjust the results; default: BH

pvalCutOff

cutoff used, then a p-value adjustment is performed. If NULL all the reported p-values are for testing.

Minlog2FC

minimum logarithm base 2 of fold changes.

saveAll

if TRUE all the temporal results are returned.

num.cores

The number of cores to use, i.e. at most how many child processes will be run simultaneously (see bpapply function from BiocParallel).

tasks

integer(1). The number of tasks per job. value must be a scalar integer >= 0L. In this documentation a job is defined as a single call to a function, such as bplapply, bpmapply etc. A task is the division of the X argument into chunks. When tasks == 0 (default), X is divided as evenly as possible over the number of workers (see MulticoreParam from BiocParallel package).

verbose

if TRUE, prints the function log to stdout

Details

Beta Regression analysis for group comparison of methylation levels is performed using the function betareg.

Value

The original GRanges object with the columns 'beta', 'log2FC', 'pvalue', and TV added.

Examples

num.cyt <- 11001 # Number of cytosine position with methylation call
max.cyt = 14000
## Gene annotation
genes <- GRanges(seqnames = '1',
                 ranges = IRanges(start = c(3631, 6788, 11649),
                                  end = c(5899, 9130, 13714)),
                 strand = c('+', '-', '-'))
mcols(genes) <- data.frame(gene_id = c('AT1G01010', 'AT1G01020',
                                       'AT1G01030'))

set.seed(123) #'#' To set a seed for random number generation
## GRanges object of the reference with methylation levels in
## its meta-column
Ref <- makeGRangesFromDataFrame(
  data.frame(chr = '1',
             start = 3000:max.cyt,
             end = 3000:max.cyt,
             strand = '*',
             p1 = rbeta(num.cyt, shape1 = 1, shape2 = 1.5)),
  keep.extra.columns = TRUE)

## List of Granges objects of individuals methylation levels
Indiv <- GRangesList(
  sample11 = makeGRangesFromDataFrame(
    data.frame(chr = '1',
               start = 3000:max.cyt,
               end = 3000:max.cyt,
               strand = '*',
               p2 = rbeta(num.cyt, shape1 = 1.5, shape2 = 2)),
    keep.extra.columns = TRUE),
  sample12 = makeGRangesFromDataFrame(
    data.frame(chr = '1',
               start = 3000:max.cyt,
               end = 3000:max.cyt,
               strand = '*',
               p2 = rbeta(num.cyt, shape1 = 1.6, shape2 = 2.1)),
    keep.extra.columns = TRUE),
  sample21 = makeGRangesFromDataFrame(
    data.frame(chr = '1',
               start = 3000:max.cyt,
               end = 3000:max.cyt,
               strand = '*',
               p2 = rbeta(num.cyt, shape1 = 10, shape2 = 4)),
    keep.extra.columns = TRUE),
  sample22 = makeGRangesFromDataFrame(
    data.frame(chr = '1',
               start = 3000:max.cyt,
               end = 3000:max.cyt,
               strand = '*',
               p2 = rbeta(num.cyt, shape1 = 11, shape2 = 4)),
    keep.extra.columns = TRUE))
## To estimate Hellinger divergence using only the methylation levels.
HD <- estimateDivergence(ref = Ref, indiv = Indiv, meth.level = TRUE,
                         columns = 1)
## To perform the nonlinear regression analysis
nlms <- nonlinearFitDist(HD, column = 4, verbose = FALSE)

## Next, the potential signal can be estimated
PS <- getPotentialDIMP(LR = HD, nlms = nlms, div.col = 4, alpha = 0.05)

## The cutpoint estimation used to discriminate the signal from the noise
cutpoints <- estimateCutPoint(PS, control.names = c('sample11', 'sample12'),
                              treatment.names = c('sample21', 'sample22'),
                              div.col = 4, verbose = TRUE)
## DIMPs are selected using the cupoints
DIMPs <- selectDIMP(PS, div.col = 4, cutpoint = min(cutpoints$cutpoint))

## Finally DIMPs statistics genes
p_DIMPs = getGRegionsStat(GR = DIMPs, grfeatures = genes, stat = 'mean',
                          prob = TRUE, column = 2L)

GR_p_DIMP = uniqueGRanges(p_DIMPs, type = 'equal', chromosomes = '1')
colnames(mcols(GR_p_DIMP)) <-  c('sample11', 'sample12', 'sample21',
                                'sample22')
names(GR_p_DIMP) <- genes$gene_id

## Group differences between methylation levels
propTest(GR = GR_p_DIMP, control.names = c('sample11', 'sample12'),
         treatment.names = c('sample21', 'sample22'))

genomaths/MethylIT.utils documentation built on July 4, 2023, 12:05 a.m.