Description Usage Arguments Details Value Author(s) See Also Examples
This function acts as driver for the simulation. It takes all required arguments and passes them on to the functions for the different stages of the simulation. The current defaults will simulate a nucleosome positioning experiment.
1 2 | simChIP(nreads, genome, file, functions = defaultFunctions(),
control = defaultControl(), verbose = TRUE, load = FALSE)
|
nreads |
Number of reads to generate. |
genome |
An object of class 'DNAStringSet' or the name of a fasta file containing the genome sequence. |
file |
Base of output file names (see Details). |
functions |
Named list of functions to use for various stages of the simulation, expected names are: ‘features’, ‘bindDens’, ‘readDens’, ‘sampleReads’, ‘readNames’, ‘readSequence’ |
control |
Named list of arguments to be passed to simulation functions (one list per function). |
verbose |
Logical indicating whether progress messages should be printed. |
load |
Logical indicating whether an attempt should be made to load intermediate results from a previous run. |
The simulation consists of six of stages:
generate feature sequence (for each chromosome): chromosome length -> feature sequence (list)
compute binding site density: feature sequence -> binding site density (vector)
compute read density: binding site density -> read density (two column matrix, one column for each strand)
sample read start sites: read density -> read positions (list)
create read names: number of reads -> unique names
obtain read sequence and quality: read positions, genome sequence, [qualities] -> output file
After each of the first three stages the results of the stage are written to a file and can be reused later.
File names are created by appending ‘_features.rdata
’, ‘_bindDensity.rdata
’ and
‘_readDensity.rdata
’ to file
respectively. Previous results will be loaded for reuse if
load
is TRUE
and files with matching names are found. This is useful to sample repeatedly
from the same read density or to recover partial results from an interrupted run.
The creation of files can be prevented by setting file =
“”. In this case all results will be
returned in a list at the end. Note that this will require more memory since all intermediate results have
to be held until the end.
The behaviour of the simulation is mainly controlled through the functions
and control
arguments.
They are expected to be lists of the same length with matching names. The names indicate the stage of the simulation
for which the function should be used; elements of control
will be used as arguments for the corresponding
functions.
A list. The components are typically either lists (with one component per chromosome) or file names
but note that this may depend on the return value of functions listed in functions
.
The components of the returned list are:
features |
Either a list of generated features or the name of a file containing that list; |
bindDensity |
Either a list with binding site densities or the name of a file containing that list; |
readDensity |
Either a list of read densities or the name of a file containing that list; |
readPosition |
Either a list of read start sites or the name of a file containing that list; |
readSequence |
The return value of the function listed as ‘ |
readNames |
Either a list of read names or the name of a file containing that list. |
Peter Humburg
defaultFunctions
, defaultControl
1 2 3 4 5 6 7 8 | ## Not run:
## To run the default nucleosome positioning simulation
## we can simply run something like the line below.
## This will result in 10 million reads sampled from the genome.
## Of course the file names have to be changed as appropriate.
simChIP(1e7, genome = "reference.fasta", file = "output/sim_10M")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.