WGBSage | R Documentation |
See Horvath, Genome Biology, 2013 for more information
WGBSage(
bsseq,
model = c("horvath", "horvathshrunk", "hannum", "skinandblood"),
padding = 15,
useENSR = FALSE,
useHMMI = FALSE,
minCovg = 5,
impute = FALSE,
minSamp = 5,
genome = NULL,
dropBad = FALSE,
...
)
bsseq |
A bsseq object (must have assays named |
model |
Which model ("horvath", "horvathshrunk", "hannum", "skinandblood") |
padding |
How many bases +/- to pad the target CpG by (DEFAULT: 15) |
useENSR |
Use ENSEMBL regulatory region bounds instead of CpGs (DEFAULT: FALSE) |
useHMMI |
Use HMM CpG island boundaries instead of padded CpGs (DEFAULT: FALSE) |
minCovg |
Minimum regional read coverage desired to estimate 5mC (DEFAULT: 5) |
impute |
Use k-NN imputation to fill in low-coverage regions? (DEFAULT: FALSE) |
minSamp |
Minimum number of non-NA samples to perform imputation (DEFAULT: 5) |
genome |
Genome to use as reference, if no genome(bsseq) is set (DEFAULT: NULL) |
dropBad |
Drop rows/cols with > half missing pre-imputation? (DEFAULT: FALSE) |
... |
Arguments to be passed to impute.knn, such as rng.seed |
Note: the accuracy of the prediction will increase or decrease depending on
how various hyper-parameters are set by the user. This is NOT a hands-off
procedure, and the defaults are only a starting point for exploration. It
will not be uncommon to tune padding
, minCovg
, and minSamp
for each
WGBS or RRBS experiment (and the latter may be impacted by whether dupes are
removed prior to importing data). Consider yourself forewarned. In the near
future we may add support for arbitrary region-coefficient inputs and result
transformation functions, which of course will just make the problems worse.
Also, please cite the appropriate papers for the Epigenetic Clock(s) you use:
For the 'horvath' or 'horvathshrunk' clocks, cite Horvath, Genome Biology 2013. For the 'hannum' clock, cite Hannum et al, Molecular Cell 2013. For the 'skinandblood' clock, cite Horvath et al, Aging 2018.
Last but not least, keep track of the parameters YOU used for YOUR estimates.
The call
element in the returned list of results is for this exact purpose.
If you need recover the GRanges object used to average(or impute) DNAme
values for the model, try granges(result$methcoefs)
on a result. The
methylation fraction and coefficients for each region can be found in the
GRanges object, result$methcoefs, where each sample has a corresponding
column with the methylation fraction and the coefficients have their own
column titled "coefs". Additionally, the age estimates are stored in
result$age (named, in case dropBad == TRUE).
A list with call, methylation estimates, coefs, age estimates
shuf_bed <- system.file("extdata", "MCF7_Cunha_chr11p15_shuffled.bed.gz",
package="biscuiteer")
orig_bed <- system.file("extdata", "MCF7_Cunha_chr11p15.bed.gz",
package="biscuiteer")
shuf_vcf <- system.file("extdata",
"MCF7_Cunha_shuffled_header_only.vcf.gz",
package="biscuiteer")
orig_vcf <- system.file("extdata",
"MCF7_Cunha_header_only.vcf.gz",
package="biscuiteer")
bisc1 <- readBiscuit(BEDfile = shuf_bed, VCFfile = shuf_vcf,
merged = FALSE)
bisc2 <- readBiscuit(BEDfile = orig_bed, VCFfile = orig_vcf,
merged = FALSE)
comb <- unionize(bisc1, bisc2)
ages <- WGBSage(comb, "horvath")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.