View source: R/preprocessCoverage.R
preprocessCoverage | R Documentation |
This function takes the coverage data from loadCoverage, scales the data, does the log2 transformation, and splits it into appropriate chunks for using calculateStats.
preprocessCoverage(
coverageInfo,
groupInfo = NULL,
cutoff = 5,
colsubset = NULL,
lowMemDir = NULL,
...
)
coverageInfo |
A list containing a DataFrame – |
groupInfo |
A factor specifying the group membership of each sample. If
|
cutoff |
The base-pair level cutoff to use. It's behavior is controlled
by |
colsubset |
Optional vector of column indices of
|
lowMemDir |
If specified, each chunk is saved into a separate Rdata
file under |
... |
Arguments passed to other methods and/or advanced arguments. Advanced arguments:
|
If chunksize
is NULL
, then mc.cores
is used to
determine the chunksize
. This is useful if you want to split the data
so each core gets the same amount of data (up to rounding).
Computing the indexes and using those for mclapply
reduces
memory copying as described by Ryan Thompson and illustrated in approach #4
at http://lcolladotor.github.io/2013/11/14/Reducing-memory-overhead-when-using-mclapply
If lowMemDir
is specified then $coverageProcessed
is NULL and
$mclapplyIndex
is a vector with the chunk identifiers.
A list with five components.
contains the processed coverage information in a
DataFrame object. Each column represents a sample and the coverage
information is scaled and log2 transformed. Note that if colsubset
is
not NULL
the number of columns will be less than those in
coverageInfo$coverage
. The total number of rows depends on the number
of base pairs that passed the cutoff
and the information stored is
the coverage at that given base. Further note that filterData is
re-applied if colsubset
is not NULL
and could thus lead to
fewer rows compared to coverageInfo$coverage
.
is a list of logical Rle objects. They contain the
partioning information according to chunksize
.
is a logical Rle with the positions of the chromosome that passed the cutoff.
is a numeric Rle with the mean coverage at each filtered base.
is a list of Rle objects containing the mean coverage at
each filtered base calculated by group. This list has length 0 if
groupInfo=NULL
.
Passed to filterData when colsubset
is specified.
Leonardo Collado-Torres
filterData, loadCoverage, calculateStats
## Split the data and transform appropriately before using calculateStats()
dataReady <- preprocessCoverage(genomeData,
cutoff = 0, scalefac = 32,
chunksize = 1e3, colsubset = NULL, verbose = TRUE
)
names(dataReady)
dataReady
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.