In quon-titative-biology/scBFA: A dimensionality reduction tool using gene detection pattern to mitigate noisy expression profile of scRNA-seq

knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Introduction

This tutorial provides an example analysis for modelling gene detection pattern as outlined in R.Li et al, 2018. The goal of this tutorial is to provide an overview of the cell type classification and visualization tasks by learning a low dimensional embedding through a class of gene detection models: that is BFA and Binary PCA.

Summary of workflow

The following workflow summarizes a typical dimensionality reduction procedure performed by BFA or Binary PCA.

Data processing
Dimensionality reduction
Visualization

Installation

Let's start with the installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("scBFA")

next we can load dependent packages

library(zinbwave)
library(SingleCellExperiment)
library(ggplot2)
library(scBFA)

Information of example dataset

The example dataset is generated from our scRNA-seq pre-DC/cDC dataset sourced from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89232 After performing all quality control procedure of genes and cells (as outlined in the paper), we then select 500 most variable genes for illustration purpose. The example dataset consists of 950 cells and 500 genes

# raw counts matrix with rows are genes and columns are cells
data("exprdata")

# a vector specify the ground truth of cell types provided by conquer database
data("celltype")

Working with SingleCellExperiment class

The design of BFA and Binary PCA allows three kinds of input object(scData): \begin{itemize} \item 1. A raw count matrix in which rows are genes and columns are cells. \item 2. A SingleCellExperiment class which at least contains the raw count matrix \item 3. A Seurat class which at least contains the raw count matrix \end{itemize} For illustration, here we construct a singleCellExperiment class to be the input of BFA and Binary PCA.

sce <- SingleCellExperiment(assay = list(counts = exprdata))

Gene Detection Model analysis

Binary Factor Analysis

Let $N$ stands for number of cells, $G$ stands for the number of genes, and $K$ stands for the number of latent dimensions.

A bfa model object computes the following parameters after fitting the gene detection matrix.

$Z$ is $N$ by $K$ embedding matrix and is named as "ZZ" in the model object
$A$ is $G$ by $K$ loading matrix and is named as "AA" in the model object
$\beta$ if there is cell-level covariates (e.g batch effect), $\beta$ is corresponding coefficient matrix and is named as "beta" in the model object
$\gamma$ if there is gene-level covariates (e.g QC measures), $\gamma$ is corresponding coefficient matrix and is named as "gamma" in the model object

We choose 3 as number of latent dimensions and project the gene detection matrix on the embedding space.

bfa_model = scBFA(scData = sce, numFactors = 2)

We then visualize the low dimensional embedding of BFA in tSNE space. Points are colored by their corresponding cell types.

set.seed(5)
df = as.data.frame(bfa_model$ZZ)
df$celltype = celltype

p1 <- ggplot(df,aes(x = V1,y = V2,colour = celltype))
p1 <- p1 + geom_jitter(size=2.5,alpha = 0.8) 
colorvalue <- c("#43d5f9","#24b71f","#E41A1C", "#ffc935","#3d014c","#39ddb2",
                "slateblue2","maroon","#f7df27","palevioletred1","olivedrab3",
                "#377EB8","#5043c1","blue","aquamarine2","chartreuse4",
                "burlywood2","indianred1","mediumorchid1")
p1 <- p1 + xlab("tsne axis 1") + ylab("tsne axis 2") 
p1 <- p1 + scale_color_manual(values = colorvalue)
p1 <- p1 + theme(panel.background = element_blank(),
                  legend.position = "right",
                  axis.text=element_blank(),
                  axis.line.x = element_line(color="black"),
                  axis.line.y = element_line(color="black"),
                  plot.title = element_blank()
                   )
p1

Binary PCA

bpca = BinaryPCA(scData = sce)

We then visualize the low dimensional embedding of Binary PCA in tSNE space. Points are colored by their corresponding cell types.

df = as.data.frame(bpca$x[,c(1:2)])
colnames(df) = c("V1","V2")
df$celltype = celltype

p1 <- ggplot(df,aes(x = V1,y = V2,colour = celltype))
p1 <- p1 + geom_jitter(size=2.5,alpha = 0.8) 
colorvalue <- c("#43d5f9","#24b71f","#E41A1C", "#ffc935","#3d014c","#39ddb2",
                "slateblue2","maroon","#f7df27","palevioletred1","olivedrab3",
                "#377EB8","#5043c1","blue","aquamarine2","chartreuse4",
                "burlywood2","indianred1","mediumorchid1")
p1 <- p1 + xlab("tsne axis 1") + ylab("tsne axis 2") 
p1 <- p1 + scale_color_manual(values = colorvalue)
p1 <- p1 + theme(panel.background = element_blank(),
                legend.position = "right",
                axis.text=element_blank(),
                axis.line.x = element_line(color="black"),
                axis.line.y = element_line(color="black"),
                plot.title = element_blank()
                )
p1