knitr::opts_chunk$set( error = FALSE, warning=FALSE, message=FALSE, collapse = TRUE, comment = "#>" ) library("BiocStyle")
snifter provides an R wrapper for the openTSNE
implementation of fast interpolated t-SNE (FI-tSNE).
It is based on r Biocpkg("basilisk")
and r CRANpkg("reticulate")
.
This vignette aims to provide a brief overview of typical use
when applied to scRNAseq data, but it does not provide a comprehensive
guide to the available options in the package.
It is highly advisable to review the documentation in snifter and the openTSNE documentation to gain a full understanding of the available options.
We will illustrate the use of snifter by generating some toy data. First, we'll load the needed libraries, and set a random seed to ensure the simulated data are reproducible (note: it is good practice to ensure that a t-SNE embedding is robust by running the algorithm multiple times).
library("snifter") library("ggplot2") theme_set(theme_bw()) set.seed(42) n_obs <- 500 n_feats <- 200 means_1 <- rnorm(n_feats) means_2 <- rnorm(n_feats) counts_a <- replicate(n_obs, rnorm(n_feats, means_1)) counts_b <- replicate(n_obs, rnorm(n_feats, means_2)) counts <- t(cbind(counts_a, counts_b)) label <- rep(c("A", "B"), each = n_obs)
The main functionality of the package lies in the fitsne
function. This function returns a matrix of t-SNE co-ordinates. In this case,
we pass in the 20 principal components computed based on the
log-normalised counts. We colour points based on the discrete
cell types identified by the authors.
fit <- fitsne(counts, random_state = 42L) ggplot() + aes(fit[, 1], fit[, 2], colour = label) + geom_point(pch = 19) + scale_colour_discrete(name = "Cluster") + labs(x = "t-SNE 1", y = "t-SNE 2")
The openTNSE package, and by extension snifter, also allows the embedding of new data into an existing t-SNE embedding. Here, we will split the data into "training" and "test" sets. Following this, we generate a t-SNE embedding using the training data, and project the test data into this embedding.
test_ind <- sample(nrow(counts), nrow(counts) / 2) train_ind <- setdiff(seq_len(nrow(counts)), test_ind) train_mat <- counts[train_ind, ] test_mat <- counts[test_ind, ] train_label <- label[train_ind] test_label <- label[test_ind] embedding <- fitsne(train_mat, random_state = 42L)
Once we have generated the embedding, we can now project
the unseen test
data into this t-SNE embedding.
new_coords <- project(embedding, new = test_mat, old = train_mat) ggplot() + geom_point( aes(embedding[, 1], embedding[, 2], colour = train_label, shape = "Train" ) ) + geom_point( aes(new_coords[, 1], new_coords[, 2], colour = test_label, shape = "Test" ) ) + scale_colour_discrete(name = "Cluster") + scale_shape_discrete(name = NULL) + labs(x = "t-SNE 1", y = "t-SNE 2")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.