Description Usage Arguments Details Value Author(s) Examples
This function obtains dispersion estimates for a count data set. For each condition (or collectively for all conditions, see 'method' argument below) it first computes for each gene an empirical dispersion value (a.k.a. a raw SCV value), then fits by regression a dispersion-mean relationship and finally chooses for each gene a dispersion parameter that will be used in subsequent tests from the empirical and the fitted value according to the 'sharingMode' argument.
1 2 3 4 5 6 7 | ## S4 method for signature 'CountDataSet'
estimateDispersions( object,
method = c( "pooled", "pooled-CR", "per-condition", "blind" ),
sharingMode = c( "maximum", "fit-only", "gene-est-only" ),
fitType = c("parametric", "local"),
locfit_extra_args=list(), lp_extra_args=list(),
modelFrame = NULL, modelFormula = count ~ condition, ... )
|
object |
a |
method |
There are three ways how the empirical dispersion can be computed:
|
sharingMode |
After the empirical dispersion values have been computed for each
gene, a dispersion-mean relationship is fitted for sharing
information across genes in order to reduce variability of the
dispersion estimates. After that, for each gene, we have two values: the
empirical value (derived only from this gene's data), and the
fitted value (i.e., the dispersion value typical for genes with an
average expression similar to those of this gene). The
|
fitType |
|
locfit_extra_args, lp_extra_args |
(only for |
modelFrame |
By default, the information in |
modelFormula |
For |
... |
extra arguments are ignored |
Behaviour for method="per-condition"
: For each replicated condition, a list, named
with the condition's name, is placed in the environment object@fitInfo
. This list
has five named elements: The vector perGeneDispEsts
contains the
empirical dispersions. The function dispFunc
is the fitted function, i.e., it takes as its argument a normalized
mean expression value and returns the corresponding
fitted dispersion. The values fitted according to this function are
in the third element fittedDispEst
, a vector of the same
length as perGeneDispEsts
. The fourt element, df
,
is an integer, indicating the number of degrees of freedom of
the per-gene estimation. The fifth element, sharingMode
,
stores the value of the sharingMode
argument to
esimateDispersions
.
Behaviour for method="blind"
and method="pooled"
: Only one list is produced,
named "blind"
or "pooled"
and placed in object@fitInfo
.
For each list in the fitInfo
environment, the dispersion
values that are intended to be used in subsequent testing are computed according to
the value of sharingMode
and are placed in the
featureData
data frame, in a column
named with the same name, prefixed with "disp_
".
Then, the dispTable
(see there) is filled to assign to each
condition the appropriate dispersion column in the phenoData frame.
Note: Up to DESeq version 1.4.x (Bioconductor release 2.8), this function was
called estimateVarianceFunctions
, stored its result differently and
did not have the arguments sharingMode
and fitType
.
estimatevarianceFunction
's behaviour corresponded
to the settings sharingMode="fit-only"
and fitType="local"
. Note that
these are not the default, because the new defaults sharingMode="maximum"
and fitType="parametric"
are more robust and tend to give better results.
The CountDataSet
cds, with the slots fitInfo
and
featureData
updated as described in Details.
Simon Anders, sanders@fs.tum.de
1 2 3 4 5 | cds <- makeExampleCountDataSet()
cds <- estimateSizeFactors( cds )
cds <- estimateDispersions( cds )
str( fitInfo( cds ) )
head( fData( cds ) )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.