This vignette explores the PCA analysis and plot.
calc_pca()
should do calculate the PCAdraw_pca()
takes those results and create the plot (but see below...)The workflow is expected as follows:
1. HermesData
object created
2. QC flags added
3. Filtering
4. Normalization
5. PCA on the assay of interest (this can be original counts, CPM, log CPM, etc. which come
out of the normalization)
6. Plot
According to the workflow above, the PCA functions no longer need to calculate the CPM, log of it, or any subsetting or filtering, as this has been done beforehand already.
However the user needs to be able to specify the assay to use. We fix the scale and centering settings of the PCA analysis as they are advised. Since we scale, we need to omit any genes that are constant across all samples.
calc_pca <- function(object, assay_name = "counts") { assert_that( is_hermes_data(object), is.string(assay_name) ) # Obtain a matrix where each column is a gene, and keep only non-constant genes. x_samples <- assay(object, assay_name) x_genes <- t(x_samples) gene_is_constant <- apply(x_genes, MARGIN = 2L, FUN = isConstant) x_genes_remaining <- x_genes[, !gene_is_constant] stats::prcomp( x = x_genes_remaining, center = TRUE, scale = TRUE, tol = sqrt(.Machine$double.eps) # Omit essentially constant principal components. ) } object <- hermes_data pca <- calc_pca(object) summary(pca)
Now we can do the plot based on the PCA result.
ggfortify
We could add this new package (which is already e.g. on BEE) to our docker containers.
install.packages("ggfortify") library(ggfortify) autoplot(pca) help(autoplot.prcomp)
So it seems this has quite a few options. See here for more background. For example:
# Different components. autoplot(pca, x = 2, y = 4) # Draw eigenvectors. autoplot(pca, loadings = TRUE, loadings.label = TRUE) # This fails, seems like a bug: # autoplot(pca, loadings.data = pca$rotation[1:3, ], loadings = TRUE)
Only issue here is that I have not found a good way so far to only plot selected eigenvectors here.
# Sample labels. autoplot(pca, label = TRUE, label.size = 2, label.hjust = 1, label.vjust = 1) # Color by a factor in colData. dta <- as.data.frame(colData(object)) autoplot(pca, data = dta, colour = "SEX")
So that works pretty nicely.
Users could also go independently to PCAtools starting from the
normalized HermesData
assays.
Based on the easy use of ggfortify
's autoplot
, we don't actually need to include dedicated
plot function ourselves in hermes
. We can document in the examples how to use that, as well as in
the Biomarker Analysis Catalog (BAC).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.