calculateTSNE | R Documentation |
Perform t-stochastic neighbour embedding (t-SNE) for the cells, based on the data in a SingleCellExperiment object.
calculateTSNE(x, ...)
## S4 method for signature 'ANY'
calculateTSNE(
x,
ncomponents = 2,
ntop = 500,
subset_row = NULL,
scale = FALSE,
transposed = FALSE,
perplexity = NULL,
normalize = TRUE,
theta = 0.5,
num_threads = NULL,
...,
external_neighbors = FALSE,
BNPARAM = KmknnParam(),
BPPARAM = SerialParam(),
use_fitsne = FALSE,
use_densvis = FALSE,
dens_frac = 0.3,
dens_lambda = 0.1
)
## S4 method for signature 'SummarizedExperiment'
calculateTSNE(x, ..., exprs_values = "logcounts", assay.type = exprs_values)
## S4 method for signature 'SingleCellExperiment'
calculateTSNE(
x,
...,
pca = is.null(dimred),
exprs_values = "logcounts",
dimred = NULL,
n_dimred = NULL,
assay.type = exprs_values
)
runTSNE(x, ..., altexp = NULL, name = "TSNE")
x |
For For |
... |
For the For |
ncomponents |
Numeric scalar indicating the number of t-SNE dimensions to obtain. |
ntop |
Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction. |
subset_row |
Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector. |
scale |
Logical scalar, should the expression values be standardized? |
transposed |
Logical scalar, is |
perplexity |
Numeric scalar defining the perplexity parameter, see |
normalize |
Logical scalar indicating if input values should be scaled for numerical precision, see |
theta |
Numeric scalar specifying the approximation accuracy of the Barnes-Hut algorithm, see |
num_threads |
Integer scalar specifying the number of threads to use in |
external_neighbors |
Logical scalar indicating whether a nearest neighbors search should be computed externally with |
BNPARAM |
A BiocNeighborParam object specifying the neighbor search algorithm to use when |
BPPARAM |
A BiocParallelParam object specifying how the neighbor search should be parallelized when |
use_fitsne |
Logical scalar indicating whether |
use_densvis |
Logical scalar indicating whether |
dens_frac , dens_lambda |
See |
exprs_values |
Alias to |
assay.type |
Integer scalar or string indicating which assay of |
pca |
Logical scalar indicating whether a PCA step should be performed inside |
dimred |
String or integer scalar specifying the existing dimensionality reduction results to use. |
n_dimred |
Integer scalar or vector specifying the dimensions to use if |
altexp |
String or integer scalar specifying an alternative experiment containing the input data. |
name |
String specifying the name to be used to store the result in the |
The function Rtsne
is used internally to compute the t-SNE.
Note that the algorithm is not deterministic, so different runs of the function will produce differing results.
Users are advised to test multiple random seeds, and then use set.seed
to set a random seed for replicable results.
The value of the perplexity
parameter can have a large effect on the results.
By default, the function will set a “reasonable” perplexity that scales with the number of cells in x
.
(Specifically, it is the number of cells divided by 5, capped at a maximum of 50.)
However, it is often worthwhile to manually try multiple values to ensure that the conclusions are robust.
If external_neighbors=TRUE
, the nearest neighbor search step will use a different algorithm to that in the Rtsne
function.
This can be parallelized or approximate to achieve greater speed for large data sets.
The neighbor search results are then used for t-SNE via the Rtsne_neighbors
function.
If dimred
is specified, the PCA step of the Rtsne
function is automatically turned off by default.
This presumes that the existing dimensionality reduction is sufficient such that an additional PCA is not required.
For calculateTSNE
, a numeric matrix is returned containing the t-SNE coordinates for each cell (row) and dimension (column).
For runTSNE
, a modified x
is returned that contains the t-SNE coordinates in reducedDim(x, name)
.
This section is relevant if x
is a numeric matrix of (log-)expression values with features in rows and cells in columns;
or if x
is a SingleCellExperiment and dimred=NULL
.
In the latter, the expression values are obtained from the assay specified by assay.type
.
The subset_row
argument specifies the features to use for dimensionality reduction.
The aim is to allow users to specify highly variable features to improve the signal/noise ratio,
or to specify genes in a pathway of interest to focus on particular aspects of heterogeneity.
If subset_row=NULL
, the ntop
features with the largest variances are used instead.
We literally compute the variances from the expression values without considering any mean-variance trend,
so often a more considered choice of genes is possible, e.g., with scran functions.
Note that the value of ntop
is ignored if subset_row
is specified.
If scale=TRUE
, the expression values for each feature are standardized so that their variance is unity.
This will also remove features with standard deviations below 1e-8.
If x
is a SingleCellExperiment, the method can be applied on existing dimensionality reduction results in x
by setting the dimred
argument.
This is typically used to run slower non-linear algorithms (t-SNE, UMAP) on the results of fast linear decompositions (PCA).
We might also use this with existing reduced dimensions computed from a priori knowledge (e.g., gene set scores), where further dimensionality reduction could be applied to compress the data.
The matrix of existing reduced dimensions is taken from reducedDim(x, dimred)
.
By default, all dimensions are used to compute the second set of reduced dimensions.
If n_dimred
is also specified, only the first n_dimred
columns are used.
Alternatively, n_dimred
can be an integer vector specifying the column indices of the dimensions to use.
When dimred
is specified, no additional feature selection or standardization is performed.
This means that any settings of ntop
, subset_row
and scale
are ignored.
If x
is a numeric matrix, setting transposed=TRUE
will treat the rows as cells and the columns as the variables/diemnsions.
This allows users to manually pass in dimensionality reduction results without needing to wrap them in a SingleCellExperiment.
As such, no feature selection or standardization is performed, i.e., ntop
, subset_row
and scale
are ignored.
This section is relevant if x
is a SingleCellExperiment and altexp
is not NULL
.
In such cases, the method is run on data from an alternative SummarizedExperiment nested within x
.
This is useful for performing dimensionality reduction on other features stored in altExp(x, altexp)
, e.g., antibody tags.
Setting altexp
with assay.type
will use the specified assay from the alternative SummarizedExperiment.
If the alternative is a SingleCellExperiment, setting dimred
will use the specified dimensionality reduction results from the alternative.
This option will also interact as expected with n_dimred
.
Note that the output is still stored in the reducedDims
of the output SingleCellExperiment.
It is advisable to use a different name
to distinguish this output from the results generated from the main experiment's assay values.
Aaron Lun, based on code by Davis McCarthy
van der Maaten LJP, Hinton GE (2008). Visualizing High-Dimensional Data Using t-SNE. J. Mach. Learn. Res. 9, 2579-2605.
Rtsne
, for the underlying calculations.
plotTSNE
, to quickly visualize the results.
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
example_sce <- runTSNE(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.