knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This tutorial provides an example analysis for modelling gene detection pattern as outlined in R.Li et al, 2018. The goal of this tutorial is to provide an overview of the cell type classification and visualization tasks by learning a low dimensional embedding through a class of gene detection models: that is BFA and Binary PCA.
The following workflow summarizes a typical dimensionality reduction procedure performed by BFA or Binary PCA.
Let's start with the installation
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("scBFA")
next we can load dependent packages
library(zinbwave) library(SingleCellExperiment) library(ggplot2) library(scBFA)
The example dataset is generated from our scRNA-seq pre-DC/cDC dataset sourced from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89232 After performing all quality control procedure of genes and cells (as outlined in the paper), we then select 500 most variable genes for illustration purpose. The example dataset consists of 950 cells and 500 genes
# raw counts matrix with rows are genes and columns are cells data("exprdata") # a vector specify the ground truth of cell types provided by conquer database data("celltype")
The design of BFA and Binary PCA allows three kinds of input object(scData): \begin{itemize} \item 1. A raw count matrix in which rows are genes and columns are cells. \item 2. A SingleCellExperiment class which at least contains the raw count matrix \item 3. A Seurat class which at least contains the raw count matrix \end{itemize} For illustration, here we construct a singleCellExperiment class to be the input of BFA and Binary PCA.
sce <- SingleCellExperiment(assay = list(counts = exprdata))
Let $N$ stands for number of cells, $G$ stands for the number of genes, and $K$ stands for the number of latent dimensions.
A bfa model object computes the following parameters after fitting the gene detection matrix.
We choose 3 as number of latent dimensions and project the gene detection matrix on the embedding space.
bfa_model = scBFA(scData = sce, numFactors = 2)
We then visualize the low dimensional embedding of BFA in tSNE space. Points are colored by their corresponding cell types.
set.seed(5) df = as.data.frame(bfa_model$ZZ) df$celltype = celltype p1 <- ggplot(df,aes(x = V1,y = V2,colour = celltype)) p1 <- p1 + geom_jitter(size=2.5,alpha = 0.8) colorvalue <- c("#43d5f9","#24b71f","#E41A1C", "#ffc935","#3d014c","#39ddb2", "slateblue2","maroon","#f7df27","palevioletred1","olivedrab3", "#377EB8","#5043c1","blue","aquamarine2","chartreuse4", "burlywood2","indianred1","mediumorchid1") p1 <- p1 + xlab("tsne axis 1") + ylab("tsne axis 2") p1 <- p1 + scale_color_manual(values = colorvalue) p1 <- p1 + theme(panel.background = element_blank(), legend.position = "right", axis.text=element_blank(), axis.line.x = element_line(color="black"), axis.line.y = element_line(color="black"), plot.title = element_blank() ) p1
bpca = BinaryPCA(scData = sce)
We then visualize the low dimensional embedding of Binary PCA in tSNE space. Points are colored by their corresponding cell types.
df = as.data.frame(bpca$x[,c(1:2)]) colnames(df) = c("V1","V2") df$celltype = celltype p1 <- ggplot(df,aes(x = V1,y = V2,colour = celltype)) p1 <- p1 + geom_jitter(size=2.5,alpha = 0.8) colorvalue <- c("#43d5f9","#24b71f","#E41A1C", "#ffc935","#3d014c","#39ddb2", "slateblue2","maroon","#f7df27","palevioletred1","olivedrab3", "#377EB8","#5043c1","blue","aquamarine2","chartreuse4", "burlywood2","indianred1","mediumorchid1") p1 <- p1 + xlab("tsne axis 1") + ylab("tsne axis 2") p1 <- p1 + scale_color_manual(values = colorvalue) p1 <- p1 + theme(panel.background = element_blank(), legend.position = "right", axis.text=element_blank(), axis.line.x = element_line(color="black"), axis.line.y = element_line(color="black"), plot.title = element_blank() ) p1
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.