knitr::opts_chunk$set( collapse = TRUE, warnings = FALSE, messages = FALSE, comment = "#>" )
cellassign
assigns cells measured using single cell RNA sequencing
to known cell types based on marker gene information. Unlike other
methods for assigning cell types from single cell RNA-seq data,
cellassign
does not require labeled single cell or purified bulk
expression data -- cellassign
only needs to know whether or not
each given gene is a marker of each cell type:
knitr::include_graphics("cellassign_overview.png")
Inference is performed using Tensorflow. For more details please see the manuscript.
cellassign
depends on tensorflow
, which can be installed as follows:
install.packages("tensorflow") library(tensorflow) install_tensorflow(extra_packages = "tensorflow-probability")
Please ensure this installs version 2 of tensorflow. You can check this by calling
tensorflow::tf_config()
You can confirm that the installation succeeded by running:
sess = tf$Session() hello <- tf$constant('Hello, TensorFlow!') sess$run(hello)
Note that the tf
object is created automatically when the tensorflow
library is loaded to provide access to the Tensorflow interface.
For more details see the Rstudio page on tensorflow installation.
cellassign
can then be installed through Bioconductor via
BiocManager::install('cellassign')
or the development version through github using the devtools
package :
devtools::install_github("Irrationone/cellassign")
We begin by illustrating basic usage of cellassign
on some
example data bundled with the package. First, load the relevant libraries:
library(SingleCellExperiment) library(cellassign)
We use an example SingleCellExperiment
consisting of 200 genes
and 500 cells:
data(example_sce) print(example_sce)
The true cell types are annotated for convenience in the Group
slot of the SingleCellExperiment
:
print(head(example_sce$Group))
Also provided is an example gene-by-cell-type binary matrix, whose entries are 1 if a gene is a marker for a given cell type and 0 otherwise:
data(example_marker_mat) print(example_marker_mat)
We further require size factors for each cell. These are stored
in sizeFactors(example_sce)
- for your data we recommend computing
them using the computeSumFactors
function from the scran
package. Note: it is highly recommended to compute size factors using the full set of genes, before subsetting to markers for input to cellassign.
s <- sizeFactors(example_sce)
We then call cellassign
using the cellassign()
function, passing
in the above information. It is critical that gene expression data containing only marker genes is used as input to cellassign. We do this here by subsetting the input SingleCellExperiment
using the row names (gene names) of the marker matrix. This also ensures that the order of the genes in the gene expression data matches the order of the genes in the marker matrix.
fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),], marker_gene_info = example_marker_mat, s = s, learning_rate = 1e-2, shrinkage = TRUE, verbose = FALSE)
This returns a cellassign
object:
print(fit)
We can access the maximum likelihood estimates (MLE) of cell type using the celltypes
function:
print(head(celltypes(fit)))
By default, this assigns a cell to a type of the probability of assignment is greater than 0.95, and "unassigned" otherwise. This can be changed with the assign_prob
parameter.
It is also possible to get all MLE parameters using mleparams
:
print(str(mleparams(fit)))
We can also visualize the probabilities of assignment using the cellprobs
function that returns a probability matrix for each cell and cell type:
pheatmap::pheatmap(cellprobs(fit))
Finally, since this is simulated data we can check the concordance with the true group values:
print(table(example_sce$Group, celltypes(fit)))
A set of example markers are included with the cellassign
package
for common cell types in the human tumour microenvironment. Users
should be aware that
cellassign
workflow is typically iterative, including
ensuring all markers are expressed in your expression data, and
removing cell types from the input marker matrix that do not appear
to be presentThe marker genes are available for the following cell types:
These can be accessed by calling
data(example_TME_markers)
Note that this is a list of two marker lists:
names(example_TME_markers)
Where symbol
contains gene symbols:
lapply(head(example_TME_markers$symbol, n = 4), head, n = 4)
and ensembl
contains the equivalent ensembl gene ids:
lapply(head(example_TME_markers$ensembl, n = 4), head, n = 4)
To use these with cellassign
we can turn them into the binary
marker by cell type matrix:
marker_mat <- marker_list_to_mat(example_TME_markers$ensembl) marker_mat[1:3, 1:3]
Important: the single cell experiment or input gene expression
matrix should be subset accordingly to match the rows of the marker
input matrix, e.g. if sce
is a SingleCellExperiment
with ensembl
IDs as rownames then call
sce_marker <- sce[intersect(rownames(marker_mat), rownames(sce)),]
Note that the rows in the single cell experiment or gene expression matrix should be ordered identically to those in the marker input matrix.
You can the proceed using cellassign
as before.
cellassign()
callThere are several options to a call to cellassign
that can alter
the results:
min_delta
: the minimum log-fold change in expression above which aX
: a covariate matrix, see section belowshrinkage
: whether to impose a hierarchical prior on the values of delta
(cell type specific increase in expression of marker genes)Here we demonstrate a method of constructing the binary marker gene matrix that encodes our a priori knowledge of cell types.
For two types of cells (Group1
and Group2
) we know a priori several good
marker genes, e.g.:
| Cell type | Genes | | --------- | ----- | | Group1 | Gene186, Gene269, Gene526, Gene536, Gene994 | | Group2 | Gene205, Gene575, Gene754, Gene773, Gene949 |
To use this in cellassign
, we must turn this into a named list, where
the names are the cell types and the entries are marker genes
(not necessarily mutually exclusive) for each cell type:
marker_gene_list <- list( Group1 = c("Gene186", "Gene269", "Gene526", "Gene536", "Gene994"), Group2 = c("Gene205", "Gene575", "Gene754", "Gene773", "Gene949") ) print(str(marker_gene_list))
We can then directly provide this to cellassign
or turn it into a binary
marker gene matrix first using the marker_list_to_mat
function:
print(marker_list_to_mat(marker_gene_list))
This has automatically included an other
group for cells that do not fall
into either type - this can be excluded by setting include_other = FALSE
.
Covariates corresponding to batch, sample, or patient-specific effects can
be included in the cellassign
model. For example, if we have two covariates
x1
and x2
:
N <- ncol(example_sce) x1 <- rnorm(N) x2 <- rnorm(N)
We can construct a design matrix using the model.matrix
function in R:
X <- model.matrix(~ 0 + x1 + x2)
Note we explicitly set no intercept by passing in 0
in the beginning.
We can then perform an equivalent cell assignment passing this in also:
fit <- cellassign(exprs_obj = example_sce, marker_gene_info = example_marker_mat, X = X, s = s, learning_rate = 1e-2, shrinkage = TRUE, verbose = FALSE)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.