This vignette explores the design of the transformation workflow and functions in hermes.
normalize()
function works for the normalization of the data our HermesData
class. However, the raw counts data after regular normalization procedure (e.g. CPM, TPM, ...) may not be proper for certain analyses (e.g. differential expression analysis, clustering analysis, and discriminant analysis) due to Heteroscedasticity problems (i.e. different variance levels across genes). variance stabilizing transformation
(VST), see article for background. The transformation to the log2 scale values is implemented by a 'regularized log' transformation (rlog), see article for more details.HermesData
functions. The outputs with transformed values are only for visualization or other analyses such as clustering or discriminant analysis. vst
and rlog
transformation methods are available in DESeq2
package, which are varianceStabilizingTransformation
and rlog
functions, respectively. Our functions will call the DESeq2
package and its functions. vst
and rlog
into the normalize()
function. varianceStabilizingTransformation
and rlog
functions only require a matrix of counts for computation. We could pass the count matrix obtained from counts()
. HermesData
object as other normalization method.Here we add parameters that are used for the 2 new methods.
control_normalize <- function(log = TRUE, lib_sizes = NULL, prior_count = 1, fit_type = "parametric") { assert_that( is.flag(log), is.null(lib_sizes) || is_counts_vector(lib_sizes), is.number(prior_count) && prior_count >= 0, is.string(fit_type) ) list( log = log, lib_sizes = lib_sizes, prior_count = prior_count, fit_type = fit_type ) }
vst
and rlog
into the normalize()
function. normalize_update = function(object, methods = c("vst", "rlog"), control = control_normalize(), ...) { method_choices <- c( "vst", "rlog") assert_that(all(methods %in% method_choices)) methods <- match.arg(methods, choices = method_choices, several.ok = TRUE) for (method in methods) { fun_name <- paste0("h_", method) method_result <- eval(utils.nest::call_with_colon( fun_name, object = object, control = control )) assay(object, method) <- method_result } metadata(object) <- c(metadata(object), list(control_normalize = control)) object }
h_vst <- function(object, control = control_normalize()) { assert_that( is_hermes_data(object), is_list_with(control, "fit_type") ) DESeq2::varianceStabilizingTransformation( counts(object), fitType = control$fit_type ) } h_rlog <- function(object, control = control_normalize()) { assert_that( is_hermes_data(object), is_list_with(control, "fit_type") ) DESeq2::rlog( counts(object), fitType = control$fit_type ) }
library(hermes) object <- hermes_data h_vst(object) h_rlog(object) aa <- normalize_update(object) assay(aa, "vst") assay(aa, "rlog")
The results look good but a little bit slow.
See also the help page for rlog
:
If rlog is run on data with number of samples in [30-49] it will print a message that it may take a few minutes, if the number of samples is 50 or larger, it will print a message that it may take a "long time", and in both cases, it will mention that the vst is a much faster transformation.
Therefore this rlog
should not be in the default for normalize()
, only vst
.
In addition, there are quite a few -Inf
values for rlog
. But it seems they are only shown in printing, but are not there:
any(!is.finite(assay(aa, "rlog")))
does not show any infinite values. So seems ok.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.