knitr::opts_chunk$set( error = FALSE, warning=FALSE, message=FALSE, collapse = TRUE, comment = "#>" ) library("BiocStyle")
snifter provides an R wrapper for the openTSNE
implementation of fast interpolated t-SNE (FI-tSNE).
It is based on r Biocpkg("basilisk")
and r CRANpkg("reticulate")
.
This vignette aims to provide a brief overview of typical use
when applied to scRNAseq data, but it does not provide a comprehensive
guide to the available options in the package.
It is highly advisable to review the documentation in snifter and the openTSNE documentation to gain a full understanding of the available options.
We will illustrate the use of snifter
using data from r Biocpkg("scRNAseq")
and
single cell utility functions provided by r Biocpkg("scuttle")
,
r Biocpkg("scater")
and
r Biocpkg("scran")
- first we load these libraries
and set a random seed to ensure the t-SNE visualisation is reproducible
(note: it is good practice to ensure that a t-SNE embedding is robust
by running the algorithm multiple times).
library("snifter") library("scRNAseq") library("scran") library("scuttle") library("scater") library("ggplot2") theme_set(theme_bw()) set.seed(42)
Before running t-SNE, we first load data generated
by Zeisel et al.
from r Biocpkg("scRNAseq")
. We filter this data to remove genes
expressed only in a small number of cells,
estimate normalisation factors using r Biocpkg("scran")
and generate 20 principal components. We will use these principal
components to generate the t-SNE embedding later.
data <- ZeiselBrainData() data <- data[rowMeans(counts(data) != 0) > 0.05, ] data <- computeSumFactors(data, cluster = quickCluster(data)) data <- logNormCounts(data) data <- runPCA(data, ncomponents = 20) ## Convert this to a factor to use as colouring variable later data$level1class <- factor(data$level1class)
The main functionality of the package lies in the fitsne
function. This function returns a matrix of t-SNE co-ordinates. In this case,
we pass in the 20 principal components computed based on the
log-normalised counts. We colour points based on the discrete
cell types identified by the authors.
mat <- reducedDim(data) fit <- fitsne(mat, random_state = 42L) ggplot() + aes(fit[, 1], fit[, 2], colour = data$level1class) + geom_point(pch = 19) + scale_colour_discrete(name = "Cell type") + labs(x = "t-SNE 1", y = "t-SNE 2")
The openTNSE package, and by extension snifter, also allows the embedding of new data into an existing t-SNE embedding. Here, we will split the data into "training" and "test" sets. Following this, we generate a t-SNE embedding using the training data, and project the test data into this embedding.
test_ind <- sample(nrow(mat), nrow(mat) / 2) train_ind <- setdiff(seq_len(nrow(mat)), test_ind) train_mat <- mat[train_ind, ] test_mat <- mat[test_ind, ] train_label <- data$level1class[train_ind] test_label <- data$level1class[test_ind] embedding <- fitsne(train_mat, random_state = 42L)
Once we have generated the embedding, we can now project
the unseen test
data into this t-SNE embedding.
new_coords <- project(embedding, new = test_mat, old = train_mat) ggplot() + geom_point( aes(embedding[, 1], embedding[, 2], colour = train_label, shape = "Train" ) ) + geom_point( aes(new_coords[, 1], new_coords[, 2], colour = test_label, shape = "Test" ) ) + scale_colour_discrete(name = "Cell type") + scale_shape_discrete(name = NULL) + labs(x = "t-SNE 1", y = "t-SNE 2")
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.