knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options( rmarkdown.html_vignette.check_title = FALSE )
library(tidytof) library(dplyr) library(ggplot2)
A useful tool for visualizing the phenotypic relationships between single cells and clusters of cells is dimensionality reduction, a form of unsupervised machine learning used to represent high-dimensional datasets in a smaller number of dimensions.
{tidytof}
includes several dimensionality reduction algorithms commonly used by biologists: Principal component analysis (PCA), t-distributed stochastic neighbor embedding (tSNE), and uniform manifold approximation and projection (UMAP). To apply these to a dataset, use tof_reduce_dimensions()
.
tof_reduce_dimensions()
.Here is an example call to tof_reduce_dimensions()
in which we use tSNE to visualize data in {tidytof}
's built-in phenograph_data
dataset.
data(phenograph_data) # perform the dimensionality reduction phenograph_tsne <- phenograph_data |> tof_preprocess() |> tof_reduce_dimensions(method = "tsne") # select only the tsne embedding columns phenograph_tsne |> select(contains("tsne")) |> head()
By default, tof_reduce_dimensions
will add reduced-dimension feature embeddings to the input tof_tbl
and return the augmented tof_tbl
(that is, a tof_tbl
with new columns for each embedding dimension) as its result. To return only the features embeddings themselves, set augment
to FALSE
(as in tof_cluster
).
phenograph_data |> tof_preprocess() |> tof_reduce_dimensions(method = "tsne", augment = FALSE)
Changing the method
argument results in different low-dimensional embeddings:
phenograph_data |> tof_reduce_dimensions(method = "umap", augment = FALSE) phenograph_data |> tof_reduce_dimensions(method = "pca", augment = FALSE)
tof_reduce_*()
functionstof_reduce_dimensions()
provides a high-level API for three lower-level functions: tof_reduce_pca()
, tof_reduce_umap()
, and tof_reduce_tsne()
. The help files for each of these functions provide details about the algorithm-specific method specifications associated with each of these dimensionality reduction approaches. For example, tof_reduce_pca
takes the num_comp
argument to determine how many principal components should be returned:
# 2 principal components phenograph_data |> tof_reduce_pca(num_comp = 2)
# 3 principal components phenograph_data |> tof_reduce_pca(num_comp = 3)
see ?tof_reduce_pca
, ?tof_reduce_umap
, and ?tof_reduce_tsne
for additional details.
tof_plot_cells_embedding()
Regardless of the method used, reduced-dimension feature embeddings can be visualized using {ggplot2}
(or any graphics package). {tidytof}
also provides some helper functions for easily generating dimensionality reduction plots from a tof_tbl
or tibble with columns representing embedding dimensions:
# plot the tsne embeddings using color to distinguish between clusters phenograph_tsne |> tof_plot_cells_embedding( embedding_cols = contains(".tsne"), color_col = phenograph_cluster ) # plot the tsne embeddings using color to represent CD11b expression phenograph_tsne |> tof_plot_cells_embedding( embedding_cols = contains(".tsne"), color_col = cd11b ) + ggplot2::scale_fill_viridis_c()
Such visualizations can be helpful in qualitatively describing the phenotypic differences between the clusters in a dataset. For example, in the example above, we can see that one of the clusters has high CD11b expression, whereas the others have lower CD11b expression.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.