makeSimDataSd | R Documentation |
This function creates simulated RNA-Seq gene expression datasets using the method presented in (Soneson and Delorenzi, BMC Bioinformatics, 2013). For the time being, it creates only simulated datasets with two conditions.
makeSimDataSd(N, param, samples = c(5, 5),
ndeg = rep(round(0.1*N), 2), fcBasis = 1.5,
libsizeRange = c(0.7, 1.4), libsizeMag = 1e+7,
modelOrg = NULL, simLengthBias = FALSE)
N |
the number of genes to produce. |
param |
a named list with negative binomial
parameter sets to sample from. The first member is
the mean parameter to sample from ( |
samples |
a vector with 2 integers, which are the number of samples for each condition (two conditions currently supported). |
ndeg |
a vector with 2 integers, which are the number of differentially expressed genes to be produced. The first element is the number of up-regulated genes while the second is the number of down-regulated genes. |
fcBasis |
the minimum fold-change for deregulation. |
libsizeRange |
a vector with 2 numbers
(generally small, see the default), as they
are multiplied with |
libsizeMag |
a (big) number to multiply
the |
modelOrg |
the organism from which the
real data are derived from. It must be one
of the supported organisms (see the main
|
simLengthBias |
a boolean to instruct the simulator to create genes whose read counts is proportional to their length. This is achieved by sorting in increasing order the mean parameter of the negative binomial distribution (and the dispersion according to the mean) which will cause an increasing gene count length with the sampling. The sampled lengths are also sorted so that in the final gene list, shorter genes have less counts as compared to the longer ones. The default is FALSE. |
The simulated data generation involves a lot of
random sampling. For guaranteed reproducibility,
be sure to use set.seed
prior to any
calculations. By default, when the metaseqR2 package
is loaded, the seed is set to 42
.
A named list with two members. The first
member (simdata
) contains the
synthetic dataset
Panagiotis Moulos
dataMatrix <- metaseqR2:::exampleCountData(2000)
## File "bottomly_read_counts.txt" from the ReCount database
#download.file(paste("http://bowtie-bio.sourceforge.net/recount/",
# "countTables/bottomly_count_table.txt",sep=""),
# destfile="~/bottomly_count_table.txt")
N <- 2000
#parList <- estimateSimParams("~/bottomly_read_counts.txt")
parList <- estimateSimParams(dataMatrix,libsizeGt=3e+4)
sim <- makeSimDataSd(N,parList)
synthData <- sim$simdata
trueDeg <- which(sim$truedeg!=0)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.