In insightsengineering/hermes: Preprocessing, analyzing, and reporting of RNA-seq data

This vignette explores the design of the transformation workflow and functions in hermes.

Problems to solve

Original normalize() function works for the normalization of the data our HermesData class. However, the raw counts data after regular normalization procedure (e.g. CPM, TPM, ...) may not be proper for certain analyses (e.g. differential expression analysis, clustering analysis, and discriminant analysis) due to Heteroscedasticity problems (i.e. different variance levels across genes).
To address this issue, one approach is to transform the normalized count data for all genes into approximately homoskedastic values (having constant variance).
We will introduce two transformations to the count data. The linear transformation approach is implemented by a variance stabilizing transformation (VST), see article for background. The transformation to the log2 scale values is implemented by a 'regularized log' transformation (rlog), see article for more details.
The returned outputs is NOT for differential expression analysis, since the differential expression analysis will be implemented in other HermesData functions. The outputs with transformed values are only for visualization or other analyses such as clustering or discriminant analysis.

Plan

Both vst and rlog transformation methods are available in DESeq2 package, which are varianceStabilizingTransformation and rlog functions, respectively. Our functions will call the DESeq2 package and its functions.
We will add two options vst and rlog into the normalize() function.
The varianceStabilizingTransformation and rlog functions only require a matrix of counts for computation. We could pass the count matrix obtained from counts().
Results should be returned as an assay in HermesData object as other normalization method.

Control function

Here we add parameters that are used for the 2 new methods.

control_normalize <- function(log = TRUE,
                              lib_sizes = NULL,
                              prior_count = 1,
                              fit_type = "parametric") {
  assert_that(
    is.flag(log),
    is.null(lib_sizes) || is_counts_vector(lib_sizes),
    is.number(prior_count) && prior_count >= 0,
    is.string(fit_type)
  )
  list(
    log = log,
    lib_sizes = lib_sizes,
    prior_count = prior_count,
    fit_type = fit_type
  )
}

Method

Add vst and rlog into the normalize() function.

  normalize_update = function(object,
                        methods = c("vst", "rlog"),
                        control = control_normalize(),
                        ...) {
    method_choices <- c( "vst", "rlog")
    assert_that(all(methods %in% method_choices))
    methods <- match.arg(methods, choices = method_choices, several.ok = TRUE)
    for (method in methods) {
      fun_name <- paste0("h_", method)
      method_result <- eval(utils.nest::call_with_colon(
        fun_name,
        object = object,
        control = control
      ))
      assay(object, method) <- method_result
    }
    metadata(object) <- c(metadata(object), list(control_normalize = control))
    object
  }

Helper functions

h_vst <- function(object, 
                  control = control_normalize()) {
  assert_that(
    is_hermes_data(object),
    is_list_with(control, "fit_type")
  )
  DESeq2::varianceStabilizingTransformation(
    counts(object), 
    fitType = control$fit_type
  )
} 

h_rlog <- function(object, 
                   control = control_normalize()) {
  assert_that(
    is_hermes_data(object),
    is_list_with(control, "fit_type")
  )
  DESeq2::rlog(
    counts(object),
    fitType = control$fit_type
  )
}

Test

library(hermes)
object <- hermes_data
h_vst(object)
h_rlog(object)
aa <- normalize_update(object)

assay(aa, "vst")
assay(aa, "rlog")

The results look good but a little bit slow.

See also the help page for rlog:

If rlog is run on data with number of samples in [30-49] it will print a message that it may take a few minutes, if the number of samples is 50 or larger, it will print a message that it may take a "long time", and in both cases, it will mention that the vst is a much faster transformation.

Therefore this rlog should not be in the default for normalize(), only vst.

In addition, there are quite a few -Inf values for rlog. But it seems they are only shown in printing, but are not there:

any(!is.finite(assay(aa, "rlog")))

does not show any infinite values. So seems ok.

insightsengineering/hermes documentation built on Dec. 15, 2024, 8:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

insightsengineering/hermes
Preprocessing, analyzing, and reporting of RNA-seq data

In insightsengineering/hermes: Preprocessing, analyzing, and reporting of RNA-seq data

Problems to solve

Plan

Control function

Method

Helper functions

Test

R Package Documentation

Browse R Packages

We want your feedback!

insightsengineering/hermes Preprocessing, analyzing, and reporting of RNA-seq data

In insightsengineering/hermes: Preprocessing, analyzing, and reporting of RNA-seq data

Problems to solve

Plan

Control function

Method

Helper functions

Test

R Package Documentation

Browse R Packages

We want your feedback!

insightsengineering/hermes
Preprocessing, analyzing, and reporting of RNA-seq data