## ---- SCP Component Analysis ----
##' @name ScpModel-ComponentAnalysis
##'
##' @title Component analysis for single cell proteomics
##'
##' @description
##'
##' Component analysis is a powerful tool for exploring data. The
##' package implements the ANOVA-principal component analysis
##' extended to linear models (APCA+) and derivatives (suggested by
##' Thiel at al. 2017). This framework is based on principal component
##' analysis (PCA) and allows exploring the data captured by each
##' model variable individually.
##'
##' @section PCA - notation and algorithms:
##'
##' Given \eqn{A} a m x n matrix, PCA can be summarized as the
##' following decomposition:
##'
##' \deqn{AA^T / (n - 1) = VLV^T}
##'
##' Where \eqn{V} is a m x k orthogonal matrix, that is \eqn{VV^T = I},
##' with k the number of components. \eqn{V} is called the matrix of
##' eigenvectors. \eqn{L} is the k x k diagonal matrix of eigenvalues
##' that contains the variance associated to each component, ordered
##' from highest to lowest variance. The unscaled PC scores are given
##' by \eqn{S = A^TV}.
##'
##' There are 2 available algorithm to perform PCA:
##'
##' - `nipals`: The non-linear iterative partial least squares
##' (NIPALS) algorithm **can handle missing values** and
##' approximates classical PCA, although it does not explicitly
##' maximize the variance. This is implemented in [nipals::nipals()].
##' - `svd`: The singular value decomposition (SVD) is used to perform
##' an exact PCA, but it **cannot handle missing values**. This is
##' implemented in [base::svd()].
##'
##' Which algorithm to use is controlled by the `pcaFUN` argument, by
##' default (`"auto"`), the function automatically uses `svd` when
##' there is no missing values and `nipals` when there is at least
##' one missing value.
##'
##' @section Component analysis methods:
##'
##' `scpComponentAnalysis()` performs a PCA on the modelling output.
##' What modelling output the function will use depends on the
##' `method`. The are 3 PCA approaches:
##'
##' - `ASCA` performs a PCA on the effect matrix, that is
##' \eqn{A = \hat{M_f}} where \eqn{f} is one of the effects in the
##' model. This PCA is useful to explore the modelled effects and
##' the relationship between different levels of a factor.
##' - `ASCA.E`: perform PCA on the effect matrix, just like ASCA. The
##' scores are then updated by projecting the effect matrix added to
##' the residuals using the eigenvectors, that is
##' \eqn{scores = (\hat{M_f} + \epsilon)^TV}. This PCA is useful
##' to explore the modelled effects while blurring these effects
##' with the unmodelled variability. Note however that for this
##' approach, the scores are no longer guaranteed to be orthogonal
##' and the eigenvalues are no longer meaningful. The percentage of
##' variation should not be interpreted.
##' - `APCA` (default) performs PCA on the effect matrix plus the
##' residuals, that is \eqn{A = \hat{M_f} + \epsilon}. This PCA
##' is useful to explore the modelled effects in relation with the
##' unmodelled variability that is remaining in the residuals.
##'
##' Available methods are listed in `scpModelComponentMethods`.
##' Note that for all three methods, a PCA on the residual matrix is
##' also performed when `residuals = TRUE`, that is
##' \eqn{A = \epsilon = Y - \hat{\beta}X^T}. A PCA on the residuals is
##' useful to explore residual effects that are not captured by any
##' effect in the model. Similarly, a PCA on the input data matrix,
##' that is on the data before modelling is also performed when
##' `unmodelled = TRUE`, that is \eqn{A = Y}.
##'
##' `scpComponentAnalysis()` always returns a list with 2 elements.
##' The first element, `bySample` is a list where each element
##' contains the PC scores for the desired model variable(s). The
##' second element, `byFeature` is a list where each element
##' contains the eigenvectors for the desired model variable(s).
##'
##' @section Exploring component analysis results:
##'
##' [scpAnnotateResults()] adds annotations to the component
##' analysis results. The annotations are added to all elements of the
##' list returned by `scpComponentAnalysis()`. See the associated man
##' page for more information.
##'
##' `scpComponentPlot()` takes one of the two elements of the list
##' generated by `scpComponentAnalysis()` and returns a list of
##' `ggplot2` scatter plots. Commonly, the first two components,
##' that bear most of the variance, are explored for visualization,
##' but other components can be explored as well thanks to the `comp`
##' argument. Each point represents either a sample or a feature,
##' depending on the provided component analysis results
##' (see examples). Change the point aesthetics by providing ggplot
##' arguments in a list (see examples).
##'
##' `scpComponentBiplot()` simultaneously explores the PC scores
##' (sample-space) and the eigenvectors (feature-space). Scores are
##' shown as points while eigenvectors are shown as arrows. Point
##' aesthetics and arrow aesthetics can be controlled with the
##' `pointParams` and the `arrowParams` arguments, respectively.
##' Moreover, arrows are also labelled and label aesthetics can be
##' controlled using `labelParams` and `textBy`. Plotting all
##' eigenvectors as arrows leads to overcrowded plots. You can limit the plotting to
##' the top longest arrows (default to the top 10) as defined by the
##' distance on the two selected PCs.
##'
##' `scpComponentAggregate()` offers functionality to aggregate the
##' results from multiple features. This can be used to obtain, for
##' example, component analysis results for proteins when modelling at
##' the peptide level. The approach is inspired from
##' [scuttle::aggregateAcrossCells()](https://bioconductor.org/packages/release/bioc/html/scuttle.html)
##' and combines, for each group, multiple values for each component
##' using [QFeatures::aggregateFeatures()]. By default, values are
##' aggregated using the median, but `QFeatures` offers other methods
##' as well. The annotation of the component results are automatically
##' aggregated as well. See the `aggregateFeatures()` man page for
##' more information on available methods and expected behavior.
##'
##' @seealso
##'
##' - [ScpModel-Workflow] to run a model on SCP data upstream of
##' component analysis.
##' - The [nipals::nipals()] function and package for detailed
##' information about the algorithm and associated parameters.
##' - The [ggplot2::ggplot()] functions and associated tutorials to
##' manipulate and save the visualization output
##' - [scpAnnotateResults()] to annotate component analysis results.
##'
##' @references
##'
##' Thiel, Michel, Baptiste FĂ©raud, and Bernadette Govaerts. 2017.
##' "ASCA+ and APCA+: Extensions of ASCA and APCA in the Analysis of
##' Unbalanced Multifactorial Designs." Journal of Chemometrics 31
##' (6): e2895.
##'
##' @author Christophe Vanderaa, Laurent Gatto
##'
##' @example inst/examples/examples_ScpModel-ComponentAnalysis.R
##'
NULL
## ---- Analysis functions ----
##' @name ScpModel-ComponentAnalysis
##'
##' @param object An object that inherits from the
##' `SingleCellExperiment` class. It must contain an estimated
##' `ScpModel` in its metadata.
##'
##' @param method A `character()` indicating which approach(es) to use
##' for principal component analysis (PCA). Are allowed:
##' `"APCA"` (default), `"ASCA"` and/or `"ASCA.E"` (multiple
##' values are allowed). `"ASCA"`, `"APCA"`, `"ASCA.E"` are
##' iterated through each desired effects.
##'
##' @param effects A `character()` indicating on which model variables
##' the component analysis should be performed. Default to all
##' modelled variables.
##'
##' @param pcaFUN A `character(1)` indicating which function to use to
##' perform PCA. "nipals" will use [nipals::nipals()] while "svd"
##' will use [base::svd()]. If "auto", the function uses "nipals"
##' if the data contain missing values and "svd" otherwise.
##'
##' @param residuals A `logical(1)`, if `TRUE`, PCA is performed on
##' the residual matrix as well.
##'
##' @param unmodelled A `logical(1)`, if `TRUE`, PCA is performed on
##' the input matrix as well.
##'
##' @param name A `character(1)` providing the name to use to retrieve
##' the model results. When retrieving a model and `name` is
##' missing, the name of the first model found in `object` is used.
##'
##' @param ... For `scpComponentAnalysis()`, further arguments passed
##' to the PCA function. For `scpComponentAggregate()`, further
##' arguments passed to [QFeatures::aggregateFeatures()].
##'
##' @export
scpComponentAnalysis <- function(object,
method = NULL,
effects = NULL,
pcaFUN = "auto",
residuals = TRUE,
unmodelled = TRUE,
name, ...) {
if (is.null(effects)) effects <- scpModelEffectNames(object, name)
if (any(mis <- !effects %in% scpModelEffectNames(object, name)))
stop("'", paste0(effects[mis], collapse = "', '"),
"' is/are not modelled effects.")
if (any(mis <- !method %in% scpModelComponentMethods))
stop("Allowed values for 'method' are '",
paste0(scpModelComponentMethods, collapse = "', '"), "'.")
pcaFUN <- .getPcaFunction(pcaFUN, object, name)
out <- List()
if (unmodelled) {
pcaUnmod <- pcaFUN(scpModelInput(object, name), ...)
out <- .addPcaToList(pcaUnmod, out, "unmodelled_")
}
if (residuals) {
pcaRes <- pcaFUN(scpModelResiduals(object, name), ...)
out <- .addPcaToList(pcaRes, out, "residuals_")
}
for (m in method) {
runMethod <- get(paste0(".run", m))
effectMats <- scpModelEffects(object, name)
for (effect in effects) {
pcaTables <- runMethod(
effectMats[[effect]],
scpModelResiduals(object, name), pcaFUN, ...
)
out <- .addPcaToList(
pcaTables, out, paste0(m, "_", effect, "_")
)
}
}
if (!length(out))
message("No components were computed. Make sure to provide ",
"'method' or set 'unmodelled = TRUE' or 'residuals = ",
"TRUE'.")
.formatPcaList(out)
}
## Internal function that returns the desired PCA algorithm function
## or that automatically selects a suitable algorithm function.
##
## @param pcaFUN A `character` indicating which function to use to
## perform PCA. "nipals" will use [nipals::nipals()] while "svd"
## will use [base::svd()]. If "auto", the function uses "nipals"
## if the data contain missing values and "svd" otherwise.
## @param object An object that inherits from the
## `SingleCellExperiment` class. It must contain an estimated
## `ScpModel` in its metadata. Ignored when pcaFun != "auto".
## @param name A `character(1)` providing the name to use to retrieve
## the model results. When retrieving a model and `name` is
## missing, the name of the first model found in `object` is used.
## Ignored when pcaFun != "auto".
##
.getPcaFunction <- function(pcaFun, object, name) {
if (pcaFun == "auto") {
pcaFun <- ifelse(anyNA(scpModelInput(object, name)), "nipals", "svd")
}
if (pcaFun == "nipals") {
.nipalsWrapper
} else if (pcaFun == "svd") {
.svdWrapper
} else {
stop("Available PCA functions are: 'nipals' or 'svd'")
}
}
## Internal function that takes PCA results provided as a list,
## converts these results into tables and appends the tables to a
## list-like object. A prefix can be provided to facilitate naming the
## newly appended elements in the returned lists.
##
## @param pcaResults A list containing PCA results. The elements
## should at least contain the following elements: "scores",
## "eigenvectors", "eigenvalues", "proportionVariance". Any
## additional element in the list is ignored.
## @param pcaList A list to which the PCA result tables should be
## appended.
## @param prefix A character(1) providing a prefix to add to the new
## element names. By default, the new elements are named
## "bySample" and "byFeatures". Providing another prefix will lead
## to naming elements `paste0(prefix, "bySample")` and
## `paste0(prefix, "byFeature")`
##
.addPcaToList <- function(pcaResults, pcaList, prefix = "") {
newElements <- .componentsToTable(pcaResults)
names(newElements) <- paste0(prefix, names(newElements))
c(pcaList, newElements)
}
## Internal function to convert a PCA result list into a DataFrameList.
## The function extracts the scores and the eigenvectors, converts
## them to a DataFrame, and adds the proportion of explained variance
## in the metadata so it can be easily retrieved for plotting. The
## two tables are returned in a List.
##
## @param x A list containing PCA results. The elements
## should at least contain the following elements: "scores",
## "eigenvectors", "eigenvalues", "proportionVariance". Any
## additional element in the list is ignored.
##
##' @importFrom S4Vectors metadata<- List
##' @importFrom methods as
.componentsToTable <- function(x) {
.checkPcaResults(x)
scores <- as(x$scores, "DataFrame")
eigenvectors <- as(x$eigenvectors, "DataFrame")
propVariance <- x$proportionVariance
names(propVariance) <- colnames(scores)
scores$cell <- rownames(scores)
eigenvectors$feature <- rownames(eigenvectors)
metad <- list(proportionVariance = propVariance)
metadata(scores) <- metadata(eigenvectors) <- metad
List(bySample = scores, byFeature = eigenvectors)
}
## Internal function that checks that PCA results comply to the
## expected data representation. We expect the PCA results to be a
## list with at least the following elements:
## 1. `scores`: a matrix with computed scores with rows corresponding
## to columns (samples) in the input data and the columns
## corresponding to the latent components.
## 2. `eigenvectors`: a matrix with eigenvectors with rows corresponding
## to rows (features) in the input data and the columns
## corresponding to the latent components.
## 3. `eigenvalues`: a vector containing the eigenvalues, that is the
## variance associated to each principal component.
## 4. `proportionVariance`: a vector containing the proportion of the
## variance explained by each component.
##
## @param x A list containing PCA results. The elements
## should at least contain the following elements: "scores",
## "eigenvectors", "eigenvalues", "proportionVariance". Any
## additional element in the list is ignored.
##
.checkPcaResults <- function(x) {
if (!is.list(x)) stop("PCA results must be a list.")
expNames <- c("scores", "eigenvectors", "eigenvalues", "proportionVariance")
if (!all(expNames %in% names(x)))
stop(
"PCA results must at least contain the following ",
"elements: ", paste(expNames, collapse = ", "), "."
)
if (is.null(names(x$eigenvalues)))
stop("Components must be named.")
if (!identical(colnames(x$scores), names(x$eigenvalues)))
stop("Components in scores do not match the eigenvalues.")
if (!identical(colnames(x$eigenvectors), names(x$eigenvalues)))
stop("Components in eigenvectors do not match the eigenvalues.")
if (!identical(names(x$proportionVariance), names(x$eigenvalues)))
stop("The proportions of variance explained do not match the eigenvalues.")
NULL
}
## Internal functions to perform APCA. APCA takes the pure effect
## matrix augmented with the residuals.
.runAPCA <- function(M, residuals, fun, ...) {
fun(M + residuals, ...)
}
## Internal functions to perform ASCA. ASCA takes the pure effect
## matrix. the 'residuals' argument is not used but included to match
## the function signature of .runAPCA() and .runASCA.E
.runASCA <- function(M, residuals, fun, ...) {
fun(M, ...)
}
## Internal functions to perform ASCA-E. ASCA-E takes the pure effect
## matrices and projects the eigenvectors on the augmented effect
## matrix. Note that scores are no longer orthogonal, hence the
## eigenvalues are no longer meaningful.
.runASCA.E <- function(M, residuals, fun, ...) {
out <- fun(M, ...)
M <- M + residuals
M[is.na(M)] <- 0
out$scores <- crossprod(M, out$eigenvectors)
out$eigenvalues <- rep(NA, length(out$eigenvalues))
out$proportionVariance <- rep(NA, length(out$proportionVariance))
names(out$eigenvalues) <- names(out$proportionVariance) <- colnames(out$scores)
out
}
## Internal wrapper around the nipals::nipals() function. The wrapper
## is necessary to format the output consistently, that is it makes
## sure the output is a list with 4 elements (detailed below) and that
## each element has expected dimension names.
##
## @param mat A data matrix where rows represent features and columns
## represent samples. `NA`s are allowed.
## @param ncomp An `integer(1)` providing the number of principal
## components to fit. Cannot exceed the smallest dimension of mat.
## @param center If `TRUE`, subtract the mean from each row of
## `mat` before PCA. Declared to overwrite default values.
## @param startcol Same argument as in `nipals::nipals()` to control
## the initial values at the first iteration for each component.
## Declared to overwrite default values. By default,
## `nipals` uses the column of mat that has maximum absolute sum,
## but we noticed that it is better to with the column that has
## the most observed values.
## @param scale if `TRUE`, divide the standard deviation from each
## row of `mat` before PCA. Declared to overwrite default values.
## @param ... Further arguments passed to `nipals::nipals`.
##
## Returns a list with 4 elements:
##
## 1. `scores`: a matrix with computed scores with rows corresponding
## to columns (samples) in the input data and the columns
## corresponding to the latent components.
## 2. `eigenvectors`: a matrix with eigenvectors with rows corresponding
## to rows (features) in the input data and the columns
## corresponding to the latent components.
## 3. `eigenvalues`: a vector containing the eigenvalues, that is the
## variance associated to each principal component.
## 4. `proportionVariance`: a vector containing the proportion of the
## variance explained by each component.
##
## There are some inconsistencies in the terminology used by NIPALS:
## - The 'scores' output is in fact the standardized scores, we scale
## those with the eigenvalues to obtain the scores.
## - The 'eig' output seems to refer to eigenvalues (ie variance
## assiociated to each components), but in fact those are the
## square root of the sums of squares.
## - The 'loadings' output does not contain the PC loadings but the
## unscaled eigenvectors. loadings = eigenvectors * sqrt(eigenvalue)
##
## This is inspired from
## https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca
##
##' @importFrom nipals nipals
.nipalsWrapper <- function(mat, ncomp = 2L, center = TRUE,
startcol = function(x) sum(!is.na(x)),
scale = FALSE, ...) {
if (ncomp > min(nrow(mat), ncol(mat)))
stop("'ncomp' cannot exceeded number of features or samples, ",
"whichever is smallest.")
mat <- mat[rowSums(!is.na(mat)) > 0, ]
res <- nipals(t(mat), ncomp = ncomp, center = center,
startcol = startcol,
scale = scale, ...)
scores <- res$scores %*% diag(res$eig)
colnames(scores) <- paste0("PC", 1:ncomp)
names(res$R2) <- paste0("PC", 1:ncomp)
names(res$eig) <- paste0("PC", 1:ncomp)
list(
scores = scores,
eigenvectors = res$loadings,
eigenvalues = res$eig^2 / (nrow(mat) - 1),
proportionVariance = res$R2
)
}
## Internal wrapper around base::svd(). Same as .nipalsWrapper().
##
## @param mat A data matrix where rows represent features and columns
## represent samples. `NA`s are allowed.
## @param ncomp An `integer(1)` providing the number of principal
## components to fit. Cannot exceed the smallest dimension of mat.
## @param center If `TRUE`, subtract the mean from each row of
## `mat` before PCA. Declared to overwrite default values.
## @param scale if `TRUE`, divide the standard deviation from each
## row of `mat` before PCA. Declared to overwrite default values.
## @param ... Further arguments passed to `base::svd()`.
##
## Note: eigenvalues = singular values^2 / (n - 1)
##
## Useful resource:
## https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca
##
.svdWrapper <- function(mat, ncomp = 2, center = TRUE,
scale = FALSE, ...) {
if (any(is.na(mat)))
stop("svd cannot deal with missing values. Use 'algorithm = ",
"\"nipals\"' instead.")
if (ncomp > min(nrow(mat), ncol(mat)))
stop("'ncomp' cannot exceeded number of features or samples, ",
"whichever is smallest.")
res <- svd(scale(t(mat), center = center, scale = scale))
scores <- (res$u %*% diag(res$d))[, 1:ncomp]
eigenvectors <- res$v[, 1:ncomp]
dimnames(scores) <- list(colnames(mat), paste0("PC", 1:ncomp))
dimnames(eigenvectors) <- list(rownames(mat), paste0("PC", 1:ncomp))
names(res$d) <- paste0("PC", 1:ncomp)
eigenvalues <- res$d^2 / (nrow(mat) - 1)
proportionVariance <- eigenvalues / sum(eigenvalues)
list(
scores = scores,
eigenvectors = eigenvectors,
eigenvalues = eigenvalues[1:ncomp],
proportionVariance = proportionVariance[1:ncomp]
)
}
## Internal function that take a named list and formats it in a nested
## list. The element names must end either with "_bySample" or
## "_byFeature". The functions then nests the list so that the output
## list contains 2 elements named "bySample" and "byFeatures", each
## containing the corresponding elements from the input list.
##
## @param pcaList A List
##
.formatPcaList <- function(pcaList) {
if (!length(pcaList)) return(List())
if (any(!grepl("_bySample$|_byFeature$", names(pcaList))))
stop("Unexpected names in PCA result list.")
bySample <- pcaList[grepl("bySample", names(pcaList))]
names(bySample) <- sub("_bySample$", "", names(bySample))
byFeature <- pcaList[grepl("byFeature", names(pcaList))]
names(byFeature) <- sub("_byFeature$", "", names(byFeature))
List(bySample = bySample, byFeature = byFeature)
}
## ---- Aggregation functions ----
##' @name ScpModel-ComponentAnalysis
##'
##' @param componentList A list of components analysis results. This
##' is typically the `bySample` or `byFeature` element of the list
##' returned by `scpComponentAnalysis()`.
##'
##' @param fcol A `character(1)` providing the name of the column
##' to use to group features.
##'
##' @param fun A `function` that summarizes the values for each group.
##' See [QFeatures::aggregateFeatures()] for a list of available
##' functions.
##'
##' @importFrom matrixStats colMedians
##' @importFrom QFeatures readSummarizedExperiment aggregateFeatures
##' @importFrom SummarizedExperiment assays rowData
##' @importFrom S4Vectors metadata
##'
##' @export
scpComponentAggregate <- function(componentList, fcol,
fun = colMedians, ...) {
if (any(mis <- sapply(componentList, function(x) !fcol %in% colnames(x))))
stop("'", fcol, "' not found in list element(s): ",
paste0(names(componentList)[mis], collapse = ", "), ".")
message("Components may no longer be orthogonal after aggregation.")
endoapply(componentList, function(x) {
md <- metadata(x)
out <- readSummarizedExperiment(
data.frame(x), grepl("^PC\\d*$", colnames(x))
)
out <- suppressMessages(aggregateFeatures(out, fcol, fun, ...))
out <- DataFrame(assays(out)[[1]], rowData(out))
metadata(out) <- md
out
})
}
## ---- Plotting functions ----
##' @name ScpModel-ComponentAnalysis
##'
##' @param componentList A list of components analysis results. This
##' is typically the `bySample` or `byFeature` element of the list
##' returned by `scpComponentAnalysis()`.
##'
##' @param comp An `integer(2)` pointing to which components to fit.
##' The values of `comp` are not allowed to exceed the number of
##' computed components in `componentList`.
##'
##' @param pointParams A `list` where each element is an argument that
##' is provided to [ggplot2::geom_point()]. This is useful to
##' change point size, transparency, or assign colour based on an
##' annotation (see [ggplot2::aes()]).
##'
##' @param maxLevels An `integer(1)` indicating how many colour levels
##' should be shown on the legend when colours are derived from a
##' discrete factor. If `maxLevels = NULL`, all levels are shown.
##' This parameters is useful to colour points based on a factor
##' with many levels that would otherwise overcrowd the legend.
##'
##' @export
scpComponentPlot <- function(componentList,
comp = 1:2,
pointParams = list(),
maxLevels = NULL) {
pl <- lapply(names(componentList), function(i) {
propVar <- metadata(componentList[[i]])$proportionVariance
if (any(comp > length(propVar))) stop("'comp' is out of bounds.")
.plotPCA(componentList[[i]], comp, propVar, pointParams) +
ggtitle(sub("_", " on ", i))
})
names(pl) <- names(componentList)
if (!is.null(maxLevels)) {
pl <- lapply(pl, .trimPlotLevels, maxLevels)
}
pl
}
## Internal function that generates a ggplot from PCA results as
## generated by scpComponentAnalysis()
##
## @param pcs A DataFrame table as generated by scpComponentAnalysis()
## @param comp A numeric(2) indicating which 2 components to show, it
## cannot exceed the number of available components
## @param proportionVariance A numeric(2) providing the associated
## proportion of explained by each of the 2 components. NA allowed.
## Used for generating PC axis labels.
## @pointParams A list of geom_point() parameters.
##
.plotPCA <- function(pcs, comp, proportionVariance, pointParams) {
stopifnot(length(comp) == 2)
axislabels <- .pcaAxisLabels(proportionVariance, comp)
ggplot(data.frame(pcs)) +
aes(x = .data[[paste0("PC", comp[[1]])]],
y = .data[[paste0("PC", comp[[2]])]]) +
do.call(geom_point, pointParams) +
labs(x = axislabels$x, y = axislabels$y) +
theme_minimal()
}
## Internal function that generates the PCA axis labels, namely it formats
## the percentage of variance explained.
##
## @param proportionVariance A numeric(2) providing the associated
## proportion of explained by each of the 2 components. NA allowed.
## @param comp A numeric(2) indicating which 2 components to show, it
## cannot exceed the number of available components
#
.pcaAxisLabels <- function(proportionVariance, comp) {
xlabel <- paste0("PC", comp[[1]])
ylabel <- paste0("PC", comp[[2]])
if (all(!is.na(proportionVariance))) {
percentVar <- round(proportionVariance[comp] * 100, 1)
xlabel <- paste0(xlabel, " (", percentVar[[1]],"%)")
ylabel <- paste0(ylabel, " (", percentVar[[2]],"%)")
}
list(x = xlabel, y = ylabel)
}
## Internal function that reduces the number of colours shown on a
## ggplot legend when the colour is determined by a categorical
## variable. The colour variable is automatically retrieve from the
## ggplot object. The function only affects the legend.
##
## @param pl A ggplot object from which to trim the colour levels.
## @param maxLevels A numeric(1) indicating up to how many levels are
## allowed on the colour legend.
.trimPlotLevels <- function(pl, maxLevels) {
what <- pl$labels$colour
if (is.null(what)) return(pl)
col <- pl$data[[what]]
if ((is.factor(col) | is.character(col)) &
length(unique(col)) > maxLevels) {
levels <- unique(col)
sel <- round(seq(1, length(levels), length.out = maxLevels))
breaks <- as.character(levels[sel])
pl <- pl + scale_color_discrete(breaks = breaks)
}
pl
}
##' @name ScpModel-ComponentAnalysis
##'
##' @param scoreList A list of components analysis results. This
##' is typically the `bySample` element in the list returned by
##' `scpComponentAnalysis()`.
##'
##' @param eigenvectorList A list of components analysis results. This
##' is typically the `byFeature` element in the list returned by
##' `scpComponentAnalysis()`.
##'
##' @param arrowParams A `list` where each element is an argument that
##' is provided to [ggplot2::geom_segment()]. This is useful to
##' change arrow head style, line width, transparency, or assign
##' colour based on an annotation (see [ggplot2::aes()]). Note
##' that changing the 'x', 'y', 'xend', and 'yend' aesthetics is
##' not allowed.
##'
##' @param labelParams A `list` where each element is an argument that
##' is provided to [ggrepel::geom_label_repel()]. This is useful
##' to change label size, transparency, or assign
##' colour based on an annotation (see [ggplot2::aes()]). Note
##' that changing the 'x', 'y', 'xend', and 'yend' aesthetics is
##' not allowed.
##'
##' @param textBy A `character(1)` indicating the name of the column
##' to use to label arrow heads.
##'
##' @param top An `integer(1)` indicating how many arrows should be
##' drawn. The arrows are sorted based on their size as determined
##' by the euclidean distance in the principal component space.
##'
##' @export
scpComponentBiplot <- function(scoreList,
eigenvectorList,
comp = 1:2,
pointParams = list(),
arrowParams = list(
arrow = arrow(length = unit(0.2, "cm"))
),
labelParams = list(
size = 2,
max.overlaps = 10
),
textBy = "feature",
top = 10,
maxLevels = NULL) {
pcn <- paste0("PC", comp)
scoreList <- .scaleComponentsToUnity(scoreList, pcn)
eigenvectorList <- .scaleComponentsToUnity(eigenvectorList, pcn)
pl <- scpComponentPlot(
scoreList, comp = comp, pointParams = pointParams,
maxLevels = maxLevels
)
for (i in names(pl)) {
pl[[i]] <- .addEigenArrows(
pl[[i]], eigenvectorList[[i]], comp, textBy, top,
arrowParams, labelParams
)
}
pl
}
## Internal function that makes sure that the principal components for features
## and samples are on the same scale. This is performed by scaling the
## data so that the largest observation has length one.
##
## @param compList A List(), typically the bySample or byFeature table
## list generated by scpComponentAnalysis()
## @param compNames A character() indicating the column names of the
## principal components to scale.
##
##' @importFrom S4Vectors endoapply
.scaleComponentsToUnity <- function(compList, compNames) {
endoapply(compList, function(components) {
x <- as.matrix(components[, compNames])
maxLength <- max(sqrt(rowSums(x^2)))
scales <- 1 / rep(maxLength, ncol(x))
components[, compNames] <- x %*% diag(scales)
components
})
}
## Internal function that takes a score plot and adds eigenvector information as
## labelled arrows.
##
## @param pl A ggplot object to add eigenvectors as arrows
## @param eigenvectors A DataFrame table of eigenvector data. This is typically
## one of the `byFeature` tables in the list returned by
## `scpComponentAnalysis()`.
## @param comp A numeric(2) indicating which 2 components to show, it
## cannot exceed the number of available components
## @param textBy A `character(1)` indicating the name of the column
## to use to label arrow heads.
## @param top An `integer(1)` indicating how many arrows should be
## drawn. The arrows are sorted based on their size as determined
## by the euclidean distance in the principal component space.
## @param arrowParams A `list` where each element is an argument that
## is provided to [ggplot2::geom_segment()]. This is useful to
## change arrow head style, line width, transparency, or assign
## colour based on an annotation (see [ggplot2::aes()]).
## @param labelParams A `list` where each element is an argument that
## is provided to [ggrepel::geom_label_repel()]. This is useful
## to change label size, transparency, or assign
## colour based on an annotation (see [ggplot2::aes()]).
##
##' @importFrom ggrepel geom_label_repel
.addEigenArrows <- function(pl, eigenvectors, comp, textBy, top,
arrowParams, labelParams) {
evData <- .formatEigenVectors(
eigenvectors, comp, textBy, top
)
if (!nrow(evData)) return(pl)
arrowParams <- .formatParamsMapping(arrowParams, comp, isArrow = TRUE)
labelParams <- .formatParamsMapping(labelParams, comp, isArrow = FALSE)
labelParams$data <- arrowParams$data <- evData
pl + ylim(-1, 1) + xlim(-1, 1) +
do.call(geom_segment, arrowParams) +
do.call(geom_label_repel, labelParams)
}
## Internal function that takes the eigenvector data, sorts the rows by the
## arrow length and takes the top rows
.formatEigenVectors <- function(eigenvectors, comp, textBy, top) {
if (!textBy %in% colnames(eigenvectors))
stop("'", textBy, "' not found in component tables. Use ",
"'scpAnnotateResults()' to add custom annotations.")
if (top == 0) return(data.frame(eigenvectors[0, , drop = FALSE]))
if (top >= nrow(eigenvectors))
return(data.frame(eigenvectors, label = eigenvectors[[textBy]]))
eigenvectors$label <- eigenvectors[[textBy]]
x <- as.matrix(eigenvectors[, paste0("PC", comp)])
vectorLength <- sqrt(rowSums(x^2))
sel <- order(vectorLength, decreasing = TRUE)[1:top]
data.frame(eigenvectors[sel, , drop = FALSE])
}
## Internal function that combines mapping information (= aes())
## needed for plotting eigenvector arrows and labels with the
## information provided by the user (eg color, size, linewidth, ...).
## If the list is not named or if some elements do not have a name,
## the function assumes that the mapping is the first unnnamed
## element.
##
## The function return a list where the mapping parameters are formatted
## so to contained the imposed 'x', 'y', 'xend', and 'yend' aesthetics.
## These aesthetics cannot be used provided.
##
## @param aesParams A list of ggplot aesthetic parameters.
## @param comp A numeric(2) indicating which 2 components to show, it
## cannot exceed the number of available components. It is used to
## determine the 'xend' and 'yend' aesthetics.
## @param isArrow A logical(1) indicating wheter aesParams are related
## to arrow parameters (TRUE) or labelling parameters (FALSE).
##
.formatParamsMapping <- function(aesParams, comp, isArrow) {
if (!inherits(aesParams, "list"))
stop("'labelParams' and 'arrowParams' must be a list.")
if (is.null(names(aesParams)))
names(aesParams) <- rep("", length(aesParams))
names(aesParams)[which(names(aesParams) == "")[1]] <- "mapping"
notAllowed <- c("x", "xend", "y", "yend")
if (any(sel <- notAllowed %in% names(aesParams) |
notAllowed %in% names(aesParams$mapping)))
stop("'", paste(notAllowed[sel], collapse = "', '"), "' cannot be ",
"user-provided.")
posX <- quo(.data[[paste0("PC", comp[[1]])]])
posY <- quo(.data[[paste0("PC", comp[[2]])]])
if (isArrow) {
imposedMapping <- list(x = 0, y = 0, xend = posX, yend = posY)
} else {
imposedMapping <- list(x = posX, y = posY, label = quo(.data$label))
}
aesParams$mapping <- do.call(
aes, c(imposedMapping, aesParams$mapping)
)
aesParams
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.