View source: R/simulateCounts.R
simulateCounts | R Documentation |
Auxiliary function to simulate read counts of methylated and unmethylated cytosines
simulateCounts(
num.samples,
sites = NULL,
alpha = NULL,
beta = NULL,
size = NULL,
theta,
sample.ids = NULL,
chromosome = "1",
start = NULL,
end = NULL,
strand = "*",
type = c("on_sites", "on_regions"),
regions = 10,
min_width = 1000,
max_width = 5000,
minCountPerIndv = 8,
maxCountPerIndv = 300,
mu = NULL,
noise = 0,
seed = 123
)
num.samples |
Number of samples to generate. |
sites |
Number of cytosine sites for each sample. |
alpha |
Alpha parameter of beta distribution. Parameter shape1 from
|
beta |
Beta parameter of beta distribution. Parameter shape2 from
|
size |
number of trials (11 or more). Expected cytosine coverage. |
theta |
Parameter theta from |
sample.ids |
Names for the samples. |
chromosome |
A character string naming the chromosome. Default '1'. |
start |
An nteger vector with the start positions for each cytosine site. Default start = NULL. |
end |
An integer vector with the end position for each cytosine site. Default end = start. Notice that if end > start, each counts will cover a region. |
strand |
One of the characters '*', '+', or '-' denoting the DNA strand. Default is '*'. |
type |
One of the string 'on_sites' or 'on_regions'. Default is 'on_sites'. If type == 'on_sites', then the counts are intended on single bases, otherwise the counts are covering regions. |
regions, min_width, max_width, minCountPerIndv, minCountPerIndv, mu |
Arguments to provide when type == 'on_regions':
|
noise |
A single number from the interval [0, 1] or a numeric vector of lengh(sites) with all its elements from the interval [0, 1]. Adds noise to the read counts. The noise is added to the methylation levels 'p', which are used to compute the coverage. If for some site 'p' + noise > 1', then the noise is not added to the site. Default is zero. |
seed |
seed a single value, interpreted as an integer, or NULL, to set a seed for Random Number Generation. Default is seed = 123. |
Methylation coverages (minimum 10) are generated from a Negative
Binomial distribution with function rnegbin
from R
package MASS. This function uses the representation of the Negative
Binomial distribution as a continuous mixture of Poisson distributions
with Gamma distributed means. Prior methylation levels are randomly
generated with beta distribution using Beta
function from R package “stats” and posterior methylation levels are
generated according Bayes' theorem. The read of methylation counts are
obtained as the product of coverage by the posterior methylation level.
Counts on regions are generated from a Negative Binomial distribution
with function rnegbin
with mean mu and variance:
mu + mu^2/theta.
A list of GRanges objects with the methylated and unmethylated counts in its metacolumn.
Robersy Sanchez (https://github.com/genomaths).
## *** Simulate samples with expected average of difference of methylation
## levels equal to 0.0427.
## === Expected mean of methylation levels ===
bmean <- function(alpha, beta) alpha/(alpha + beta)
bmean(0.03, 0.5) - bmean(0.007, 0.5) #' Expected difference = 0.04279707
## === The number of cytosine sitesto generate ===
sites = 5000
## === Simulate samples ===
ref = simulateCounts(num.samples = 1, sites = sites, alpha = 0.007,
beta = 0.5, size = 50, theta = 4.5, sample.ids = 'C1')
treat = simulateCounts(num.samples = 2, sites = sites, alpha = 0.03,
beta = 0.5, size = 50, theta = 4.5,
sample.ids = c('T1', 'T2'))
### === Simulate counts on regions ====
simulateCounts(num.samples = 7,
sample.ids = c(paste0('C',1:4), paste0('T', 1:3)),
type = 'on_regions', theta = 2.5, regions = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.