The r Biocpkg("wateRmelon")
provides a set of tools for importing, quality control and normalizing Illumina DNA methylation microarray data. All of the functions described in this vignette are fully compatible with the r Biocpkg("minfi")
package, including any workflows or pipelines that involves it. Additionally wateRmelon extends the r Biocpkg("methylumi")
package to allow for MethyLumiSet representations of EPIC array data. If you are using many hundreds of arrays, you could consider using the related r Biocpkg("wateRmelon")
.
In addition to our own functions we provide implementations for a variety of age-prediction and cell-type estimations which work for both 450k and EPIC data.
wateRmelon is designed to extend the r Biocpkg("methylumi")
and r Biocpkg("minfi")
R packages so there is quite the number of dependencies which need to be installed. This can be handled by simply running:
install.packages('BiocManager') BiocManager::install('wateRmelon') library(wateRmelon)
library(wateRmelon) data(melon)
Alternatively, wateRmelon can be installed directly from our github
install.packages('devtools') devtools::install_github('schalkwyk/wateRmelon')
The package contains a small subset of 450k array data that can be used to explore all the funtions quickly. The melon
data set is a MethyLumiSet
containing 12 samples (columns) and 3364 features (rows)
data(melon) dim(melon) # Quality filter using default thresholds melon.pf <- pfilter(melon) # Normalize using one of the many available methods melon.dasen.pf <- dasen(melon.pf) # Extract Betas for downstream analysis norm_betas <- betas(melon.dasen.pf)
The IDAT reading provided in r Biocpkg("methylumi")
has been updated to handle EPIC arrays, this is provided within the readEPIC
function. Ideally, you can then read in a phenotype information that correspond to the idat files you have loaded in.
Alternatively you can read in data using r Biocpkg("minfi")
and work from an RGChannelSet
or MethylSet
object, using the read_metharray
functions.
mlumi <- readEPIC('path/to/idats')
Quality control is a vital part of performing an EWAS, it is important that we are able to easily identify samples that are outliers and could lead to false positive results.
There are many ways to identify outliers. Our preferred method is to use the outlyx
function which provides a robust and reproducible multivariate method that we feel works well for most data-types.
outliers <- outlyx(melon, plot=TRUE) print(outliers) # remove outliers with melon[,!outliers$out]
Similarly it is important to check the quality control probes to measure how well the experiment performs. For this we provide bscon
to quickly check the bisulfite conversion probes for each sample to estimate the percentage of DNA that has been successfully converted. Values above 90% are ideally but values above 80% are also acceptable.
bsc <- bscon(melon) hist(bsc, xlab = c(0, 100))
Poorly performing probes can be identified from the bead counts and detection p-values using the pfilter
function
melon.pf <- pfilter(melon)
wateRmelon now provides functions for three popular methods for generate phenotypic variables. Whether these are needed for quality control or identifying sample mismatch. Or could even been provided as a missing covariate for analyses
wateRmelon provides 5 epigenetic age predictors: Horvath, Horvath Skin & Blood, Hannum, PhenoAge and Lin! As an extra measure we provide the number of missing probes for each clock and this can be used to supply your own list of coefficients to calculate different linear age predictions. !!However!! any normalisation methods that the original age predictors use to predict ages so there will be slight differences compared to the original methods.
agep(melon, method='all') agep(melon, method='horvath')
Sexes of samples can be determined like so:
estimateSex(betas(melon), do_plot=TRUE)
wateRmelon is able to compute cell counts for bulk blood (currently only bulk blood).
estimateCellCounts.wateRmelon(melon, referencePlatform = "IlluminaHumanMethylation450k") estimateCellCounts.wateRmelon(melon, referencePlatform = "IlluminaHumanMethylationEPIC") # change reference
Norm Method | Short Description | Extra Details
------- | ----- | --------
dasen | Best performing normalisation method according to Pidsley 2013 |
nasen | Simpler implemenation of dasen that does not correct for sentrix positions | Can be used on EPIC data without any consequences.
adjusted_dasen | dasen with our interpolatedXY method | offset_fit = FALSE
performs adjusted_nasen instead
adjusted_funnorm |funnorm with our interpolatedXY method | Only Available to RGChannelSet
SWAN | Popular method for normalisation|
As such they can be used as such:
dasen.melon <- dasen(melon) # Use whichever method you would like to use.
Normalization aims to align the data across samples to make it ready for analysis. The degree of normalization across samples is variable, where samples undergo a more dramatic transformation to resemble the rest of the data. This can be an indicator that a sample has addition-al noise or is an outlier. The qual function provides an estimate of vio-lence that has been introduced through preprocessing and normalization.
das <- dasen(melon) qu <- qual(betas(melon), betas(das)) plot(qu[,1], qu[,2])
TODO...
dmrse_row(melon.pf) dmrse_row(melon.dasen.pf) # Slightly better standard errores
Not available for minfi objects
genki(melon.pf) genki(melon.dasen.pf)
seabi(melon.pf, sex=pData(melon.pf)$sex, X=fData(melon.pf)$CHR=='X') seabi(melon.dasen.pf, sex=pData(melon.dasen.pf)$sex, X=fData(melon.dasen.pf)$CHR=='X')
Although we cannot predict what type of analysis you are expecting to perform we have a couple of recommendations that should be considered before you perform statistical testing. Firstly we recommend a final sweep of the normalized data using pwod
bet <- betas(melon) pwod_bet <- pwod(bet) # Statistical Analysis using pwod_bet
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.