This vignette explores the PCA analysis and plot.

Idea

Workflow

The workflow is expected as follows: 1. HermesData object created 2. QC flags added 3. Filtering 4. Normalization 5. PCA on the assay of interest (this can be original counts, CPM, log CPM, etc. which come out of the normalization) 6. Plot

Calculation Function

According to the workflow above, the PCA functions no longer need to calculate the CPM, log of it, or any subsetting or filtering, as this has been done beforehand already.

However the user needs to be able to specify the assay to use. We fix the scale and centering settings of the PCA analysis as they are advised. Since we scale, we need to omit any genes that are constant across all samples.

calc_pca <- function(object,
                     assay_name = "counts") {
  assert_that(
    is_hermes_data(object),
    is.string(assay_name)
  )

  # Obtain a matrix where each column is a gene, and keep only non-constant genes.
  x_samples <- assay(object, assay_name)
  x_genes <- t(x_samples)
  gene_is_constant <- apply(x_genes, MARGIN = 2L, FUN = isConstant)
  x_genes_remaining <- x_genes[, !gene_is_constant]

  stats::prcomp(
    x = x_genes_remaining,
    center = TRUE,
    scale = TRUE, 
    tol = sqrt(.Machine$double.eps)  # Omit essentially constant principal components.
  )
}

object <- hermes_data
pca <- calc_pca(object)
summary(pca)

Plot Function

Now we can do the plot based on the PCA result.

Evaluation of ggfortify

We could add this new package (which is already e.g. on BEE) to our docker containers.

install.packages("ggfortify")
library(ggfortify)
autoplot(pca)
help(autoplot.prcomp)

So it seems this has quite a few options. See here for more background. For example:

# Different components.
autoplot(pca, x = 2, y = 4)

# Draw eigenvectors.
autoplot(pca, loadings = TRUE, loadings.label = TRUE)

# This fails, seems like a bug:
# autoplot(pca, loadings.data = pca$rotation[1:3, ], loadings = TRUE) 

Only issue here is that I have not found a good way so far to only plot selected eigenvectors here.

# Sample labels.
autoplot(pca, label = TRUE, label.size = 2, label.hjust = 1, label.vjust = 1)

# Color by a factor in colData.
dta <- as.data.frame(colData(object))
autoplot(pca, data = dta, colour = "SEX")

So that works pretty nicely.

Alternatives

Users could also go independently to PCAtools starting from the normalized HermesData assays.

Conclusion

Based on the easy use of ggfortify's autoplot, we don't actually need to include dedicated plot function ourselves in hermes. We can document in the examples how to use that, as well as in the Biomarker Analysis Catalog (BAC).



insightsengineering/hermes documentation built on Dec. 15, 2024, 8:07 a.m.