knitr::opts_chunk$set( collapse = TRUE, eval = TRUE, comment = "#>" )
library(dreval)
The primary aim of the dreval
package is to evaluate and compare one or more
reduced dimension representations, in terms of how well they retain the
structure from a reference representation. Here, we illustrate its application
to a single-cell RNA-seq data set of peripheral blood mononuclear cells (PBMCs),
representing a subset of the pbmc3k
data set from the TENxPBMCData
package.
The set of genes has been subset to only highly variable ones, and several
dimension reduction methods (PCA, tSNE and UMAP) have been applied.
data(pbmc3ksub)
pbmc3ksub
The dreval()
function is the main function in the package, and calculates
several metrics comparing each of the reduced dimension representations in the
SingleCellExperiment
object to the designed reference representation. Here, we
use the underlying normalized and log-transformed data contained in the
logcounts
assay as the reference representation. For a list of the calculated
metrics, see the help page for dreval()
.
dre <- dreval( sce = pbmc3ksub, refType = "assay", refAssay = "logcounts", dimReds = NULL, nSamples = 500, kTM = c(10, 75), verbose = FALSE, distNorm = "l2" )
The output of dreval()
is a list with two elements. The plots
entry contains
several diagnostic plot objects, while the scores
entry contains the
calculated scores for each of the evaluated reduced dimension representations.
These can be summarized visually with the plotRankSummary()
function.
dre$scores suppressPackageStartupMessages({ library(ggplot2) }) cowplot::plot_grid(plotlist = lapply(dre$plots$disthex, function(w) w + theme(axis.title = element_text(size = 7))), ncol = 2)
plotRankSummary(dre$scores) + theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
In the example above, we used the original feature space (the logcounts
assay)
as the reference space, to which all reduced dimension representations were
compared. It is also possible to use one of the reduced dimension
representations as the reference space, as we will illustrate here.
dre <- dreval( sce = pbmc3ksub, refType = "dimred", refDimRed = "PCA", dimReds = NULL, nSamples = 500, kTM = c(10, 75), verbose = FALSE, distNorm = "l2" )
dre$scores cowplot::plot_grid(plotlist = lapply(dre$plots$disthex, function(w) w + theme(axis.title = element_text(size = 7))), ncol = 2)
plotRankSummary(dre$scores) + theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
dreval
relies heavily on the calculation of pairwise distances among samples,
and is consequently computationally demanding for large data sets. The
dreval()
function allows for subsampling a smaller number of samples, as well
as selecting a subset of the variables, to speed up calculations and reduce the
memory footprint. Here we illustrate that for the example data set, the
calculated metrics are stable under subsampling of cells.
suppressPackageStartupMessages({ library(dplyr) library(tidyr) library(ggplot2) }) set.seed(1) nS <- c(100, 400, 1000, 1500, 2700) v <- lapply(nS, function(n) { if (n == 2700) nR <- 1 else nR <- 2 lapply(seq_len(nR), function(i) { message(n, " samples, replicate ", i) dreval(sce = pbmc3ksub, refAssay = "logcounts", dimReds = NULL, nSamples = n, kTM = 50, verbose = FALSE)$scores %>% dplyr::mutate(nSamples = n, replicate = i) }) }) vv <- do.call(dplyr::bind_rows, v) vv <- vv %>% tidyr::gather( key = "measure", value = "value", -Method, -dimensionality, -nSamples, -replicate ) suppressWarnings( ggplot(vv, aes(x = nSamples, y = value, color = Method)) + geom_point(size = 3) + facet_wrap(~ measure, scales = "free_y") + geom_smooth(se = FALSE) + theme_bw() )
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.