knitr::opts_chunk$set(echo = TRUE)
Single-cell gene expression profiling methods are well suited for determining transcriptional
heterogeneity across a cell population. In many cases, this heterogeneity may reflect dynamic changes
in the transcriptional landscape due to biological processes such as differentiation or tumorigenesis. The
overall trajectory of this change may be unified or branched and is partially obscured by the noise of the
transcriptional programs within each cell. To address this computational challenge, a number of
packages have been created to calculate pseudotemporal ordering of cells to reflect cell-state
hierarchies from gene expression data. Each of these packages have unique inputs, outputs, and
visualization schemes, which makes comparisons across these methods time-consuming. Here we
present Cell Tree Generator for Gene Expression Matrices (r Rpackage("ctgGEM")
), a package to streamline the
building of cell-state hierarchies from single-cell gene expression data across multiple existing tools for
improved comparability and reproducibility. ctgGEM provides the user with simplified way to build trees
with these packages using one function call with a single dataset and the desired visualization name as a
parameter. Results are also stored in the SIF file format for use in downstream analysis workflows or
input into Cytoscape. The packages currently supported by ctgGEM are:
r Biocpkg("destiny")
r Biocpkg("monocle")
r Biocpkg("sincell")
r Biocpkg("TSCAN")
r Rpackage("ctgGEM")
r Rpackage("ctgGEM")
requires R version 4.0 or higher (available
here) and the most recent version of Bioconductor.
For more information on using Bioconductor, please see their website at https://
bioconductor.org. The following code will install Bioconductor,
r Rpackage("ctgGEM")
and the following CRAN packages and their
dependencies: r CRANpkg(c("ggm", "ggplot2", "igraph", "irlba", "maptpx", "VGAM"))
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("ctgGEM")
After installing, attach the r Rpackage("ctgGEM")
package with the following
command:
library(ctgGEM)
ctgGEMset
objectThe r Rpackage("ctgGEM")
workflow is based around a single data class,
ctgGEMset
, that extends the r Biocpkg("Biobase")
SummarizedExperiment
class, which provides a common interface familiar to those who have analyzed
microarray experiments with Bioconductor. The ctgGEMset
class requires the
following three inputs:
exprsData
, a numeric matrix of expression values, where rows are genes,
and columns are cells/samples
phenoData
, a data frame, where rows are cells/samples, and columns are
cell/sample attributes (cell type, culture condition, day captured, etc.)
featureData
, a data frame, where rows are features (e.g. genes), and
columns are gene attributes (gene identifiers, gc content, etc.)
The expression value matrix must have the same number of columns as the
phenoData has rows, and it must have the same number of rows as the featureData
data frame has rows. Row names of the phenoData object must match the column
names of the expression matrix. Row names of the featureData object must
matchrow names of the expression matrix. Details for loading the data for and
constructing an example ctgGEMset
object suitable for this vignette can be
found in the following section.
A ctgGEMset
object that will support all of the tree types in
r Rpackage("ctgGEM")
requires a gene expression matrix, which for the monocle
and TSCAN
tree types must contain strictly non-negative values.
For this vignette, we will construct a toy object, using
the data provided in the r Biocpkg("HSMMSingleCell")
package. For more
information on this data set, please see the documentation available at
https://doi.org/doi:10.18129/B9.bioc.HSMMSingleCell
# load HSMMSingleCell package library(HSMMSingleCell) # load the data data(HSMM_expr_matrix) data(HSMM_sample_sheet) data(HSMM_gene_annotation)
ctgGEMset
objectUsing the data loaded in the previous step, we can construct the ctgGEMset
object for this vignette as follows:
toyGEMset <- ctgGEMset(exprsData = HSMM_expr_matrix, phenoData = HSMM_sample_sheet, featureData = HSMM_gene_annotation)
Using the "monocle" method requires a pair of parameters prepared with a column
name of gene identifiers in featureData()
that corresponds to the gene short
names as the first item to be set, followed by the data type. The data type
must be one of "UMI", "TC", "FPKM", "TPM", "LTFPKM", or "LTTPM", where "UMI" is
UMI counts, "TC" is transcript counts, "FPKM" is FPKM, and "TPM" is TPM.
Monocle works best with untransformed data, but if you want to use
log-transformed FPKM or TPM, use data type "LTFPKM" or "LTTPM", respectively.
Here we will use the gene_short_name column and the data type "FPKM".
monocleInfo(toyGEMset, "gene_id") <- "gene_short_name" monocleInfo(toyGEMset, "ex_type") <- "FPKM"
The "monocle" method can be used in either semi-supervised or unsupervised mode. To run in semi-supervised mode, the names of two known classifying marker genes from the column specified as "gene_id" need to be set. If these two marker genes are not set, unsupervised mode will run. The two genes we will use for this demonstration are MYF5, a known marker for myoblasts, and ANPEP, a known marker for fibroblasts.
# Set two marker genes to use semi-supervised mode # Alternatively, omit the next two lines to run in unsupervised mode monocleInfo(toyGEMset, "cell_id_1") <- "MYF5" # marks myoblasts monocleInfo(toyGEMset, "cell_id_2") <- "ANPEP" # marks fibroblasts
In addition to its primary plot, the "TSCAN" method can also generate a single gene vs. pseudotime plot. To generate this plot, we need to supply the rowname of a single gene row in exprs(), and store it in TSCANinfo. Here we will use the "ENSG00000000003.10" gene.
TSCANinfo(toyGEMset) <- "ENSG00000000003.10"
The "sincell" method can be used with a variety of parameters to control the
distance method used, which type (if any) dimensionality reduction to be used,
and which clustering method to use. If no options are specified, PCA will be
applied, with KNN clustering. Additional details concerning these parameters
can be found in the sincell package documentation. To use a distance method
with no dimensionality reduction, set the "method" parameter using the
sincellInfo()
function to one of the following: "euclidean", "L1" (Manhattan
distance), "cosine", "pearson", "spearman", or "MI" (Mutual Information).
sincellInfo(toyGEMset, "method") <- "pearson"
To use dimensionality reduction with the "sincell" method, set the "method" parameter to "PCA" for Principal Component Analysis, "ICA" for Independent Component Analysis, "tSNE" for t-Distributed Stochastic Neighbor Embedding, "classical-MDS" for classical Multidimensional Scaling, or "nonmetric-MDS" for non-metric Multidimensional Scaling. If using "classical-MDS" or "nonmetric- MDS", we can also select the distance method to use with the "MDS.distance" parameter. The options for "MDS.distance" are the same as those listed above for no dimensionality reduction, with the exception the that the "cosine" method cannot be selected for "MDS.distance".
sincellInfo(toyGEMset, "method") <- "classical-MDS" sincellInfo(toyGEMset, "MDS.distance") <- "spearman"
The final optional parameter for the "sincell" method is "clust.method".
Acceptable values are "max-distance", "percent", "knn", "k-medoids", "ward.D",
"ward.D2", "single", "complete", "average", "mcquitty", "median", or
"centroid". In this example, we will set it to "k-medoids".
sincellInfo(toyGEMset, "clust.method") <- "k-medoids"
Unlike the other tree types, the "destiny" method does not have the option to
provide further information. It should be noted, however, that the r
Biocpkg("destiny")
documentation recommends cleaned, referably normalized
data, and suggests that single-cell RNA-seq count data should be transformed
using a variance-stabilizing transformation (e.g. log or rlog).
r Rpackage("ctgGEM")
To use r Rpackage("ctgGEM")
, call the generate_tree
function with the
desired tree method and ctgGEMset
object. The user can specify the desired output
directory using the 'outputDir' parameter, or default to the temporary directory
returned by tempdir
.
To use our example ctgGEMset
and the "destiny" method, we would type the
following:
toyGEMset <- generate_tree(dataSet = toyGEMset, treeType = "destiny")
This stores the final trees in the originalTrees list within the ctgGEMset
object and if necessary creates a new folder, called "CTG-Output" that contains
a folder called "SIFs" containing the .SIF text file for the final tree, and
contains a folder called "Plots" containing .png images of the following plots:
plotOriginalTree(toyGEMset, "destinyDM") plotOriginalTree(toyGEMset, "destinyDPT")
To use our example ctgGEMset
and the "monocle" method, we would type the
following:
toyGEMset <- generate_tree(dataSet = toyGEMset, treeType = "monocle")
This stores the final trees in the originalTrees list within the ctgGEMset
object, a simplified r CRANpkg("igraph")
version of the tree in the
treeList within the ctgGEMset
object, and if necessary creates a new
folder, called "CTG-Output", that contains a folder called "SIFs" containing
the .SIF text file for the final tree, and a folder called "Plots" containing a
.png image of the following plot:
plotOriginalTree(toyGEMset, "monocle")
To use our example ctgGEMset
and the "sincell" method, we would type the
following:
toyGEMset <- generate_tree(dataSet = toyGEMset, treeType = "sincell")
This stores the final trees in the originalTrees list within the ctgGEMset
object, a simplified r CRANpkg("igraph")
version of the tree in the
treeList within the ctgGEMset
object, and if necessary creates a new
folder, called "CTG-Output", that contains a folder called "Plots" containing
.png images of the following plots:
plotOriginalTree(toyGEMset, "sincellIMC") plotOriginalTree(toyGEMset, "sincellMST") plotOriginalTree(toyGEMset, "sincellSST")
To use our example ctgGEMset
and the "TSCAN"" method, we would type the
following:
toyGEMset <- generate_tree(dataSet = toyGEMset, treeType = "TSCAN")
This stores the final trees in the originalTrees list within the ctgGEMset
object, a simplified r CRANpkg("igraph")
version of the tree in the
treeList within the ctgGEMset
object, and if necessary creates a new
folder, called "CTG-Output", that contains a folder called "SIFs" containing
the .SIF text file for the final tree, and a folder called "Plots" containing
.png images of the following plots:
plotOriginalTree(toyGEMset, "TSCANclustering") plotOriginalTree(toyGEMset, "TSCANsingleGene")
If at some point we wish to view the plot of a tree generated after it's been
created, but don't want to have to regenerate it and all its files, r
Rpackage("ctgGEM")
has a function named plotOriginalTree()
, that will
reproduce a plot stored in a ctgGEMset object. To use this function, we must
know the name of the tree we wish to plot. We can view the names of the trees
in our toyGEMset object using the names()
function.
names(originalTrees(toyGEMset))
Once we have the names, we can choose a tree to plot. Let's plot the "destinyDM" tree again.
plotOriginalTree(toyGEMset, "destinyDM")
Using this function eliminates the need to regenerate the tree to view a plot that was already created, thereby saving time for trees that require extensive computations to generate.
To store your analysis session result for later use, you can use the .Rda format.
save(toyGEMset, file = "toyGEMset.Rda")
sessionInfo()
Philipp Angerer et al. (2015): destiny: diffusion maps for large-scale single-cell data in R. Helmholtz-Zentrum München. URL: http://bioinformatics.oxfordjournals.org/content/32/8/1241
destiny package URL: https://bioconductor.org/packages/release/bioc/html/destiny.html
Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS and Rinn JL (2014). “The dynamics and regulators of cell fate decisions are revealed by pseudo-temporal ordering of single cells.” Nature Biotechnology.
monocle package URL: https://bioconductor.org/packages/release/bioc/html/monocle.html
Juliá M, Telenti A, Rausell A (2014): Sincell: R package for the statistical assessment of cell state hierarchies from single-cell RNA-seq data. bioRxiv preprint
sincell package URL: https://bioconductor.org/packages/release/bioc/html/sincell.html
Zhicheng Ji and Hongkai Ji (2015). TSCAN: TSCAN: Tools for Single-Cell ANalysis. R package version 1.12.0.
TSCAN package URL: https://bioconductor.org/packages/release/bioc/html/TSCAN.html
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.