dmrseq: Main function for detecting and evaluating significance of...
In kdkorthauer/dmrseq: Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing

dmrseq

R Documentation

Main function for detecting and evaluating significance of DMRs.

Description

Performs a two-step approach that (1) detects candidate regions, and (2) scores candidate regions with an exchangeable (across the genome) statistic and evaluates statistical significance using a permuation test on the pooled null distribution of scores.

Usage

dmrseq(
  bs,
  testCovariate,
  adjustCovariate = NULL,
  cutoff = 0.1,
  minNumRegion = 5,
  smooth = TRUE,
  bpSpan = 1000,
  minInSpan = 30,
  maxGapSmooth = 2500,
  maxGap = 1000,
  verbose = TRUE,
  maxPerms = 10,
  matchCovariate = NULL,
  BPPARAM = bpparam(),
  stat = "stat",
  block = FALSE,
  blockSize = 5000,
  chrsPerChunk = 1
)

Arguments

`bs`	bsseq object containing the methylation values as well as the phenotype matrix that contains sample level covariates
`testCovariate`	Character value indicating which variable (column name) in `pData(bs)` to test for association of methylation levels. Can alternatively specify an integer value indicating which of column of `pData(bs)` to use. This is used to construct the design matrix for the test statistic calculation. To run using a continuous or categorial covariate with more than two groups, simply pass in the name of a column in 'pData' that contains this covariate. A continuous covariate is assmued if the data type in the 'testCovariate' slot is continuous, with the exception of if there are only two unique values (then a two group comparison is carried out).
`adjustCovariate`	an (optional) character value or vector indicating which variables (column names) in `pData(bs)` will be adjusted for when testing for the association of methylation value with the `testCovariate`. Can alternatively specify an integer value or vector indicating which of the columns of `pData(bs)` to adjust for. If not NULL (default), then this is also used to construct the design matrix for the test statistic calculation.
`cutoff`	scalar value that represents the absolute value (or a vector of two numbers representing a lower and upper bound) for the cutoff of the single CpG coefficient that is used to discover candidate regions. Default value is 0.10.
`minNumRegion`	positive integer that represents the minimum number of CpGs to consider for a candidate region. Default value is 5. Minimum value is 3.
`smooth`	logical value that indicates whether or not to smooth the CpG level signal when discovering candidate regions. Defaults to TRUE.
`bpSpan`	a positive integer that represents the length in basepairs of the smoothing span window if `smooth` is TRUE. Default value is 1000.
`minInSpan`	positive integer that represents the minimum number of CpGs in a smoothing span window if `smooth` is TRUE. Default value is 30.
`maxGapSmooth`	integer value representing maximum number of basepairs in between neighboring CpGs to be included in the same cluster when performing smoothing (should generally be larger than `maxGap`)
`maxGap`	integer value representing maximum number of basepairs in between neighboring CpGs to be included in the same DMR.
`verbose`	logical value that indicates whether progress messages should be printed to stdout. Defaults value is TRUE.
`maxPerms`	a positive integer that represents the maximum number of permutations that will be used to generate the global null distribution of test statistics. Default value is 10.
`matchCovariate`	An (optional) character value indicating which variable (column name) of `pData(bs)` will be blocked for when constructing the permutations in order to test for the association of methylation value with the `testCovariate`, only to be used when `testCovariate` is a two-group factor and the number of permutations possible is less than 500000. Alternatively, you can specify an integer value indicating which column of `pData(bs)` to block for. Blocking means that only permutations with balanced composition of `testCovariate` values will be used (for example if you have samples from different gender and this is not your covariate of interest, it is recommended to use gender as a matching covariate to avoid one of the permutations testing entirely males versus females; this violates the null hypothesis and will decrease power). If not NULL (default), then no blocking is performed.
`BPPARAM`	a `BiocParallelParam` object to specify the parallel backend. The default option is `BiocParallel::bpparam()` which will automatically creates a cluster appropriate for the operating system.
`stat`	a character vector indicating the name of the column of the output to use as the region-level test statistic. Default value is 'stat' which is the region level-statistic designed to be comparable across the genome. It is not recommended to change this argument, but it can be done for experimental purposes. Possible values are: 'L' - the number of loci in the region, 'area' - the sum of the smoothed loci statistics, 'beta' - the effect size of the region, 'stat' - the test statistic for the region, or 'avg' - the average smoothed loci statistic.
`block`	logical indicating whether to search for large-scale (low resolution) blocks of differential methylation (default is FALSE, which means that local DMRs are desired). If TRUE, the parameters for `bpSpan`, `minInSpan`, and `maxGapSmooth` should be adjusted (increased) accordingly. This setting will also merge candidate regions that (1) are in the same direction and (2) are less than 1kb apart with no covered CpGs separating them. The region-level model used is also slightly modified - instead of a loci-specific intercept for each CpG in theregion, the intercept term is modeled as a natural spline with one interior knot per each 10kb of length (up to 10 interior knots).
`blockSize`	numeric value indicating the minimum number of basepairs to be considered a block (only used if `block`=TRUE). Default is 5000 basepairs.
`chrsPerChunk`	a positive integer value indicating the number of chromosomes per chunk. The default is 1, meaning that the data will be looped through one chromosome at a time. When pairing up multiple chromosomes per chunk, sizes (in terms of numbers of CpGs) will be taken into consideration to balance the sizes of each chunk.

Value

a GRanges object that contains the results of the inference. The object contains one row for each candidate region, sorted by q-value and then chromosome. The standard GRanges chr, start, and end are included, along with at least 7 metadata columns, in the following order: 1. L = the number of CpGs contained in the region, 2. area = the sum of the smoothed beta values 3. beta = the coefficient value for the condition difference (there will be more than one column here if a multi-group comparison was performed), 4. stat = the test statistic for the condition difference, 5. pval = the permutation p-value for the significance of the test statistic, and 6. qval = the q-value for the test statistic (adjustment for multiple comparisons to control false discovery rate). 7. index = an IRanges containing the indices of the region's first CpG to last CpG.

Examples


# load example data 
data(BS.chr21)

# the covariate of interest is the 'CellType' column of pData(BS.chr21)
testCovariate <- 'CellType'

# run dmrseq on a subset of the chromosome (10K CpGs)
regions <- dmrseq(bs=BS.chr21[240001:250000,],
                 cutoff = 0.05,
                 testCovariate=testCovariate)

kdkorthauer/dmrseq documentation built on Sept. 26, 2024, 9:32 p.m.

kdkorthauer/dmrseq index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kdkorthauer/dmrseq
Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing

dmrseq: Main function for detecting and evaluating significance of...
In kdkorthauer/dmrseq: Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing

Main function for detecting and evaluating significance of DMRs.

Description

Usage

Arguments

Value

Examples

Related to dmrseq in kdkorthauer/dmrseq...

R Package Documentation

Browse R Packages

We want your feedback!

kdkorthauer/dmrseq Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing

dmrseq: Main function for detecting and evaluating significance of... In kdkorthauer/dmrseq: Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing

Main function for detecting and evaluating significance of DMRs.

Description

Usage

Arguments

Value

Examples

Related to dmrseq in kdkorthauer/dmrseq...

R Package Documentation

Browse R Packages

We want your feedback!

kdkorthauer/dmrseq
Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing

dmrseq: Main function for detecting and evaluating significance of...
In kdkorthauer/dmrseq: Detection and inference of differentially methylated regions from Whole Genome Bisulfite Sequencing