methSeg: Segment methylation or differential methylation profile

View source: R/methSeg.R

methSegR Documentation

Segment methylation or differential methylation profile


The function uses a segmentation algorithm (fastseg) to segment the methylation profiles. Following that, it uses gaussian mixture modelling to cluster the segments into k components. This process uses mean methylation value of each segment in the modeling phase. Each component ideally indicates quantitative classification of segments, such as high or low methylated regions.


  diagnostic.plot = TRUE,
  join.neighbours = FALSE,
  initialize.on.subset = 1,



GRanges, methylDiff, methylDiffDB, methylRaw or methylRawDB . If the object is a GRanges it should have one meta column with methylation scores and has to be sorted by position, i.e. ignoring the strand information.


if TRUE a diagnostic plot is plotted. The plot shows methylation and length statistics per segment group. In addition, it shows diagnostics from mixture modeling: the density function estimated and BIC criterion used to decide the optimum number of components in mixture modeling.


if TRUE neighbouring segments that cluster to the same are joined by extending the ranges, summing up num.marks and averaging over seg.means.


a numeric value indicating either percentage or absolute value of regions to subsample from segments before performing the mixture modeling. The value can be either between 0 and 1, e.g. 0.1 means that 10 integer higher than 1 to define the number of regions to sample. By default uses the whole dataset, which can be time consuming on large datasets. (Default: 1)


arguments to fastseg function in fastseg package, or to densityMclust in Mclust package, could be used to fine tune the segmentation algorithm. E.g. Increasing "alpha" will give more segments. Increasing "cyberWeight" will give also more segments."maxInt" controls the segment extension around a breakpoint. "minSeg" controls the minimum segment length. "G" argument denotes number of components used in BIC selection in mixture modeling. For more details see fastseg and Mclust documentation.


To be sure that the algorithm will work on your data, the object should have at least 5000 records

After initial segmentation with fastseg(), segments are clustered into self-similar groups based on their mean methylation profile using mixture modeling. Mixture modeling estimates the initial parameters of the distributions by using the whole dataset. If "initialize.on.subset" argument set as described, the function will use a subset of the data to initialize the parameters of the mixture modeling prior to the Expectation-maximization (EM) algorithm used by Mclust package.


A GRanges object with segment classification and information. 'seg.mean' column shows the mean methylation per segment. '' column shows the segment groups obtained by mixture modeling


Altuna Akalin, contributions by Arsene Wabo and Katarzyna Wreczycka

See Also

methSeg2bed, joinSegmentNeighbours




 # it finds the optimal number of componets as 6

 # however the BIC stabilizes after 4, we can also try 4 componets

 # get segments to BED file


al2na/methylKit documentation built on Feb. 14, 2025, 7:53 p.m.