knitr::opts_chunk$set( message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>", fig.align = "center" )
clustifyr
provides several functions to plot tSNE or UMAP results. The plot_dims()
function will plot tSNE or UMAP data using a meta.data table and can color the cells based on cluster identity.
library(clustifyr) library(clustifyrdata) library(dplyr) library(tibble) # Matrix of normalized single-cell RNA-seq counts pbmc_matrix[1:3, 1:3] # meta.data table containing cluster assignments for each cell pbmc_meta[1:5, ] # Create tSNE and color cells based on cluster ID plot_dims( x = "UMAP_1", # name of column in the meta.data containing the data to plot on x-axis y = "UMAP_2", # name of column in the meta.data containing the data to plot on y-axis data = pbmc_meta, # meta.data table containing cluster assignments for each cell feature = "seurat_clusters" # name of column in meta.data to color cells by )
Cells can also be colored based on the expression level of a gene or list of genes using the plot_gene()
function.
# Create tSNE and color cells based on gene expression plot_gene( x = "UMAP_1", # name of column in the meta.data containing the data to plot on x-axis y = "UMAP_2", # name of column in the meta.data containing the data to plot on y-axis expr_mat = pbmc_matrix, # matrix of normalized single-cell RNA-seq counts metadata = pbmc_meta %>% rownames_to_column("rn"), # meta.data table containing cluster assignments for each cell genes = c("CD79B", "CD8A"), # vector of gene names to color cells cell_col = "rn" # name of column in meta.data containing the cell IDs )
clustifyr()
resultsThe clustifyr()
function outputs a matrix of correlation coefficients and clustify_lists()
and clustify_nudge()
output positive scores. clustifyr
provides built-in functions to help visualize these results.
Cell type assignments can be assessed by plotting the clustifyr()
correlation matrix as a heatmap using the plot_cor_heatmap()
function.
# Run clustifyr() res <- clustify( input = pbmc_matrix, # matrix of normalized single-cell RNA-seq counts metadata = pbmc_meta, # meta.data table containing cluster assignments for each cell ref_mat = cbmc_ref, # reference matrix containing bulk RNA-seq data for each cell type query_genes = pbmc_vargenes, # list of highly varible genes identified with Seurat cluster_col = "seurat_clusters" # name of column in meta.data containing cell clusters ) # Create heatmap using the clustifyr correlation matrix plot_cor_heatmap( cor_mat = res # matrix of correlation coefficients from clustifyr() )
The plot_cor()
function can also be used to create a tSNE for each cell type of interest and color the cells based on the correlation coefficients.
# Create a tSNE for each cell type of interest and color cells based on correlation coefficients plot_cor( x = "UMAP_1", # name of column in the meta.data containing the data to plot on x-axis y = "UMAP_2", # name of column in the meta.data containing the data to plot on y-axis cor_mat = res, # matrix of correlation coefficients from clustifyr() metadata = pbmc_meta, # meta.data table containing cluster assignments for each cell data_to_plot = colnames(res)[1:2], # name of cell type(s) to plot correlation coefficients cluster_col = "seurat_clusters" # name of column in meta.data containing cell clusters )
Cell clusters can also be labeled using the plot_best_call()
function, which takes the correlation matrix and labels cell clusters with the cell type that has the highest correlation coefficient.
# Create tSNE and label clusters with the cell type that has the highest correlation coefficient plot_best_call( cor_mat = res, # matrix of correlation coefficients from clustifyr() metadata = pbmc_meta, # meta.data table containing UMAP or tSNE data do_label = TRUE, # should the feature label be shown on each cluster? do_legend = FALSE, # should the legend be shown? cluster_col = "seurat_clusters" )
clustifyr()
accuracyThe clustifyr()
results can also be evaluated by over-clustering the data and comparing the cell type assignments before and after over-clustering. This is accomplished using the overcluster_test()
function. The cell type assignments should be similar with and without over-clustering.
# Overcluster cells and compare cell type assignments with and without over-clustering overcluster_test( expr = pbmc_matrix, # matrix of normalized single-cell RNA-seq counts metadata = pbmc_meta, # meta.data table containing UMAP or tSNE data ref_mat = cbmc_ref, # reference matrix containing bulk RNA-seq data for each cell type cluster_col = "seurat_clusters", # name of column in meta.data containing cell clusters n = 5 # expand cluster number n-fold for overclustering )
The cell types from the bulk RNA-seq reference matrix can also be mixed together using the make_comb_ref()
function to assess the specificity of the cell type assignments. If a cluster shows a higher correlation when using the mixed reference matrix, this suggests that the cluster contains multiple cell types.
# Create reference containing different combinations of the bulk RNA-seq data comb_ref <- make_comb_ref( ref_mat = cbmc_ref # reference matrix containing bulk RNA-seq data for each cell type ) # Peek at the new combined reference comb_ref[1:5, 1:5] # Run clustifyr() using the combined reference comb_res <- clustify( input = pbmc_matrix, # matrix of normalized single-cell RNA-seq counts metadata = pbmc_meta, # meta.data table containing cluster assignments for each cell ref_mat = comb_ref, # reference matrix containing bulk RNA-seq data for each cell type query_genes = pbmc_vargenes, # list of highly varible genes identified with Seurat cluster_col = "seurat_clusters" # name of column in meta.data containing cell clusters ) # Create tSNE and label clusters with the assigned cell types from the combined reference plot_best_call( cor_mat = comb_res, # matrix of correlation coefficients from clustifyr() metadata = pbmc_meta, # meta.data table containing UMAP or tSNE data do_label = TRUE, # should the feature label be shown on each cluster? do_legend = FALSE, # should the legend be shown? cluster_col = "seurat_clusters" )
Visualization of other attributes shared in the metadata between ref and query by plot_cols
, such as nGene, nUMI, mt_percentage, as another way of identity confirmation after clustify
. Certain cell types have distinct patterns, more genes detected, for example.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.