knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(methylSig)
The purpose of this vignette is to show users how to retrofit their methylSig
< 0.99.0 code to work with the refactor in version 0.99.0 and later.
In versions < 0.99.0 of methylSig
, the methylSigReadData()
function read Bismark coverage files, Bismark genome-wide CpG reports, or MethylDackel bedGraphs. Additionally, users could destrand the data, filter by coverage, and filter SNPs.
meth = methylSigReadData( fileList = files, pData = pData, assembly = 'hg19', destranded = TRUE, maxCount = 500, minCount = 10, filterSNPs = TRUE, num.cores = 1, fileType = 'cytosineReport')
In versions >= 0.99.0 of methylSig
, the user should read data with bsseq::read.bismark()
and then apply functions that were once bundled within methylSigReadData()
.
files = c( system.file('extdata', 'bis_cov1.cov', package='methylSig'), system.file('extdata', 'bis_cov2.cov', package='methylSig') ) bsseq_stranded = bsseq::read.bismark( files = files, colData = data.frame(row.names = c('test1','test2')), rmZeroCov = FALSE, strandCollapse = FALSE )
After reading data, filter by coverage. Note, we are changing our dataset to something we can use with the downstream functions.
# Load data for use in the rest of the vignette data(BS.cancer.ex, package = 'bsseqData') bs = BS.cancer.ex[1:10000] bs = filter_loci_by_coverage(bs, min_count = 5, max_count = 500)
If the locations of C-to-T and G-to-A SNPs are known, or some other set of location should be removed:
# Construct GRanges object remove_gr = GenomicRanges::GRanges( seqnames = c('chr21', 'chr21', 'chr21'), ranges = IRanges::IRanges( start = c(9411552, 9411784, 9412099), end = c(9411552, 9411784, 9412099) ) ) bs = filter_loci_by_location(bs = bs, gr = remove_gr)
In versions < 0.99.0 of methylSig
, the methylSigTile()
function combined aggregating CpG data over pre-defined tiles and genomic windows.
# For genomic windows, tiles = NULL windowed_meth = methylSigTile(meth, tiles = NULL, win.size = 10000) # For pre-defined tiles, tiles should be a GRanges object.
In versions >= 0.99.0 of methylSig
, tiling is separated into two functions, tile_by_regions()
and tile_by_windows()
. Users should chooose one or the other.
windowed_bs = tile_by_windows(bs = bs, win_size = 10000)
# Collapsed promoters on chr21 and chr22 data(promoters_gr, package = 'methylSig') promoters_bs = tile_by_regions(bs = bs, gr = promoters_gr)
In versions < 0.99.0 of methylSig
, the methylSigCalc
function had a min.per.group
parameter to determine how many samples per group had to have coverage in order to be tested.
result = methylSigCalc( meth = meth, comparison = 'DR_vs_DS', dispersion = 'both', local.info = FALSE, local.winsize = 200, min.per.group = c(3,3), weightFunc = methylSig_weightFunc, T.approx = TRUE, num.cores = 1)
In versions >= 0.99.0 of methylSig
, the min.per.group
functionality is performed by a separate function filter_loci_by_group_coverage()
. Also note the change in form to define dispersion calculations, and the use of local information.
# Look a the phenotype data for bs bsseq::pData(bs) # Require at least two samples from cancer and two samples from normal bs = filter_loci_by_group_coverage( bs = bs, group_column = 'Type', c('cancer' = 2, 'normal' = 2))
After removing loci with insufficient information, we can now use the diff_methylsig()
test.
# Test cancer versus normal with dispersion from both groups diff_gr = diff_methylsig( bs = bs, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'), disp_groups = c('case' = TRUE, 'control' = TRUE), local_window_size = 0, t_approx = TRUE, n_cores = 1)
In versions < 0.99.0 of methylSig
, the methylSigDSS
function also had a min.per.group
parameter to determine how many samples per group had to have coverage. Users also couldn't specify which methylation groups to recover. The form of design
, formula
, and contrast
, remain the same in versions >= 0.99.0.
contrast = matrix(c(0,1), ncol = 1) result_dss = methylSigDSS( meth = meth, design = design1, formula = '~ group', contrast = contrast, group.term = 'group', min.per.group=c(3,3))
In versions >= 0.99.0 of methylSig
, the single methylSigDSS()
function is replaced by a fit function diff_dss_fit()
and a test functiotn diff_dss_test()
. As with diff_methylsig()
, users should ensure enough samples have sufficient coverage with the filter_loci_by_group_coverage()
function. The design
and formula
are unchanged in their forms.
If a continuous covariate is to be tested, filter_loci_by_group_coverage()
should be skipped, as there are no groups. In prior versions of methylSigDSS()
, this was not possible, and the group constraints were incorrectly applied prior to testing on a continuous covariate.
# IF NOT DONE PREVIOUSLY # Require at least two samples from cancer and two samples from normal bs = filter_loci_by_group_coverage( bs = bs, group_column = 'Type', c('cancer' = 2, 'normal' = 2))
# Test the simplest model with an intercept and Type diff_fit_simple = diff_dss_fit( bs = bs, design = bsseq::pData(bs), formula = as.formula('~ Type'))
The contrast
parameter is also changed in its form. Note the, additional parameters to specify how to recover group methylation. methylation_group_column
and methylation_groups
should be specified for group versus group comparisons. For continuous covariates, methylation_group_column
is sufficient, and the samples will be grouped into top/bottom 25 percentile based on the continuous covariate column name given in methylation_group_column
.
# Test the simplest model for cancer vs normal # Note, 2 rows corresponds to 2 columns in diff_fit_simple$X simple_contrast = matrix(c(0,1), ncol = 1) diff_simple_gr = diff_dss_test( bs = bs, diff_fit = diff_fit_simple, contrast = simple_contrast, methylation_group_column = 'Type', methylation_groups = c('case' = 'cancer', 'control' = 'normal'))
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.