Using ctgGEM
In ctgGEM: Generating Tree Hierarchy Visualizations from Gene Expression Data

knitr::opts_chunk$set(echo = TRUE)

Introduction

Single-cell gene expression profiling methods are well suited for determining transcriptional heterogeneity across a cell population. In many cases, this heterogeneity may reflect dynamic changes in the transcriptional landscape due to biological processes such as differentiation or tumorigenesis. The overall trajectory of this change may be unified or branched and is partially obscured by the noise of the transcriptional programs within each cell. To address this computational challenge, a number of packages have been created to calculate pseudotemporal ordering of cells to reflect cell-state hierarchies from gene expression data. Each of these packages have unique inputs, outputs, and visualization schemes, which makes comparisons across these methods time-consuming. Here we present Cell Tree Generator for Gene Expression Matrices (r Rpackage("ctgGEM")), a package to streamline the building of cell-state hierarchies from single-cell gene expression data across multiple existing tools for improved comparability and reproducibility. ctgGEM provides the user with simplified way to build trees with these packages using one function call with a single dataset and the desired visualization name as a parameter. Results are also stored in the SIF file format for use in downstream analysis workflows or input into Cytoscape. The packages currently supported by ctgGEM are:

r Biocpkg("destiny")
r Biocpkg("monocle")
r Biocpkg("sincell")
r Biocpkg("TSCAN")

Installing `r Rpackage("ctgGEM")`

r Rpackage("ctgGEM") requires R version 4.0 or higher (available here) and the most recent version of Bioconductor. For more information on using Bioconductor, please see their website at https:// bioconductor.org. The following code will install Bioconductor, r Rpackage("ctgGEM") and the following CRAN packages and their dependencies: r CRANpkg(c("ggm", "ggplot2", "igraph", "irlba", "maptpx", "VGAM"))

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ctgGEM")

After installing, attach the r Rpackage("ctgGEM") package with the following command:

library(ctgGEM)

Preparing a `ctgGEMset` object

The r Rpackage("ctgGEM") workflow is based around a single data class, ctgGEMset, that extends the r Biocpkg("Biobase") SummarizedExperiment class, which provides a common interface familiar to those who have analyzed microarray experiments with Bioconductor. The ctgGEMset class requires the following three inputs:

exprsData, a numeric matrix of expression values, where rows are genes, and columns are cells/samples
phenoData, a data frame, where rows are cells/samples, and columns are cell/sample attributes (cell type, culture condition, day captured, etc.)
featureData, a data frame, where rows are features (e.g. genes), and columns are gene attributes (gene identifiers, gc content, etc.)

The expression value matrix must have the same number of columns as the phenoData has rows, and it must have the same number of rows as the featureData data frame has rows. Row names of the phenoData object must match the column names of the expression matrix. Row names of the featureData object must matchrow names of the expression matrix. Details for loading the data for and constructing an example ctgGEMset object suitable for this vignette can be found in the following section.

Loading the Data

A ctgGEMset object that will support all of the tree types in r Rpackage("ctgGEM") requires a gene expression matrix, which for the monocle and TSCAN tree types must contain strictly non-negative values.
For this vignette, we will construct a toy object, using the data provided in the r Biocpkg("HSMMSingleCell") package. For more information on this data set, please see the documentation available at https://doi.org/doi:10.18129/B9.bioc.HSMMSingleCell

# load HSMMSingleCell package 
library(HSMMSingleCell)
# load the data 
data(HSMM_expr_matrix)
data(HSMM_sample_sheet)
data(HSMM_gene_annotation)

Constructing a `ctgGEMset` object

Using the data loaded in the previous step, we can construct the ctgGEMset object for this vignette as follows:

toyGEMset <- ctgGEMset(exprsData = HSMM_expr_matrix,
                            phenoData = HSMM_sample_sheet,
                            featureData = HSMM_gene_annotation)

Additional Info for the "monocle" method

Using the "monocle" method requires a pair of parameters prepared with a column name of gene identifiers in featureData() that corresponds to the gene short names as the first item to be set, followed by the data type. The data type must be one of "UMI", "TC", "FPKM", "TPM", "LTFPKM", or "LTTPM", where "UMI" is UMI counts, "TC" is transcript counts, "FPKM" is FPKM, and "TPM" is TPM.
Monocle works best with untransformed data, but if you want to use log-transformed FPKM or TPM, use data type "LTFPKM" or "LTTPM", respectively.
Here we will use the gene_short_name column and the data type "FPKM".

monocleInfo(toyGEMset, "gene_id") <- "gene_short_name"
monocleInfo(toyGEMset, "ex_type") <- "FPKM"

The "monocle" method can be used in either semi-supervised or unsupervised mode. To run in semi-supervised mode, the names of two known classifying marker genes from the column specified as "gene_id" need to be set. If these two marker genes are not set, unsupervised mode will run. The two genes we will use for this demonstration are MYF5, a known marker for myoblasts, and ANPEP, a known marker for fibroblasts.

# Set two marker genes to use semi-supervised mode
# Alternatively, omit the next two lines to run in unsupervised mode
monocleInfo(toyGEMset, "cell_id_1") <- "MYF5" # marks myoblasts
monocleInfo(toyGEMset, "cell_id_2") <- "ANPEP" # marks fibroblasts

Additional Info for the "TSCAN" method

In addition to its primary plot, the "TSCAN" method can also generate a single gene vs. pseudotime plot. To generate this plot, we need to supply the rowname of a single gene row in exprs(), and store it in TSCANinfo. Here we will use the "ENSG00000000003.10" gene.

TSCANinfo(toyGEMset) <- "ENSG00000000003.10"

Additional Info for the "sincell" method

The "sincell" method can be used with a variety of parameters to control the distance method used, which type (if any) dimensionality reduction to be used, and which clustering method to use. If no options are specified, PCA will be applied, with KNN clustering. Additional details concerning these parameters can be found in the sincell package documentation. To use a distance method with no dimensionality reduction, set the "method" parameter using the sincellInfo() function to one of the following: "euclidean", "L1" (Manhattan distance), "cosine", "pearson", "spearman", or "MI" (Mutual Information).

sincellInfo(toyGEMset, "method") <- "pearson"

To use dimensionality reduction with the "sincell" method, set the "method" parameter to "PCA" for Principal Component Analysis, "ICA" for Independent Component Analysis, "tSNE" for t-Distributed Stochastic Neighbor Embedding, "classical-MDS" for classical Multidimensional Scaling, or "nonmetric-MDS" for non-metric Multidimensional Scaling. If using "classical-MDS" or "nonmetric- MDS", we can also select the distance method to use with the "MDS.distance" parameter. The options for "MDS.distance" are the same as those listed above for no dimensionality reduction, with the exception the that the "cosine" method cannot be selected for "MDS.distance".

sincellInfo(toyGEMset, "method") <- "classical-MDS"
sincellInfo(toyGEMset, "MDS.distance") <- "spearman"

The final optional parameter for the "sincell" method is "clust.method".
Acceptable values are "max-distance", "percent", "knn", "k-medoids", "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", or "centroid". In this example, we will set it to "k-medoids".

sincellInfo(toyGEMset, "clust.method") <- "k-medoids"

Additional Info for the "destiny" method

Unlike the other tree types, the "destiny" method does not have the option to provide further information. It should be noted, however, that the r Biocpkg("destiny") documentation recommends cleaned, referably normalized data, and suggests that single-cell RNA-seq count data should be transformed using a variance-stabilizing transformation (e.g. log or rlog).

Using `r Rpackage("ctgGEM")`

To use r Rpackage("ctgGEM"), call the generate_tree function with the desired tree method and ctgGEMset object. The user can specify the desired output directory using the 'outputDir' parameter, or default to the temporary directory returned by tempdir.

Using the "destiny" method

To use our example ctgGEMset and the "destiny" method, we would type the following:

toyGEMset <- generate_tree(dataSet = toyGEMset, treeType = "destiny")

This stores the final trees in the originalTrees list within the ctgGEMset object and if necessary creates a new folder, called "CTG-Output" that contains a folder called "SIFs" containing the .SIF text file for the final tree, and contains a folder called "Plots" containing .png images of the following plots:

plotOriginalTree(toyGEMset, "destinyDM")
plotOriginalTree(toyGEMset, "destinyDPT")

Using the "monocle" method

To use our example ctgGEMset and the "monocle" method, we would type the following:

toyGEMset <- generate_tree(dataSet = toyGEMset, treeType = "monocle")

This stores the final trees in the originalTrees list within the ctgGEMset object, a simplified r CRANpkg("igraph") version of the tree in the treeList within the ctgGEMset object, and if necessary creates a new folder, called "CTG-Output", that contains a folder called "SIFs" containing the .SIF text file for the final tree, and a folder called "Plots" containing a .png image of the following plot:

plotOriginalTree(toyGEMset, "monocle")

Using the "sincell" method

To use our example ctgGEMset and the "sincell" method, we would type the following:

toyGEMset <- generate_tree(dataSet = toyGEMset, treeType = "sincell")

This stores the final trees in the originalTrees list within the ctgGEMset object, a simplified r CRANpkg("igraph") version of the tree in the treeList within the ctgGEMset object, and if necessary creates a new folder, called "CTG-Output", that contains a folder called "Plots" containing .png images of the following plots:

plotOriginalTree(toyGEMset, "sincellIMC")
plotOriginalTree(toyGEMset, "sincellMST")
plotOriginalTree(toyGEMset, "sincellSST")

Using the "TSCAN" method

To use our example ctgGEMset and the "TSCAN"" method, we would type the following:

toyGEMset <- generate_tree(dataSet = toyGEMset, treeType = "TSCAN")

plotOriginalTree(toyGEMset, "TSCANclustering")
plotOriginalTree(toyGEMset, "TSCANsingleGene")

Replotting Trees

If at some point we wish to view the plot of a tree generated after it's been created, but don't want to have to regenerate it and all its files, r Rpackage("ctgGEM") has a function named plotOriginalTree(), that will reproduce a plot stored in a ctgGEMset object. To use this function, we must know the name of the tree we wish to plot. We can view the names of the trees in our toyGEMset object using the names() function.

names(originalTrees(toyGEMset))

Once we have the names, we can choose a tree to plot. Let's plot the "destinyDM" tree again.

plotOriginalTree(toyGEMset, "destinyDM")

Using this function eliminates the need to regenerate the tree to view a plot that was already created, thereby saving time for trees that require extensive computations to generate.

To store your analysis session result for later use, you can use the .Rda format.

save(toyGEMset, file = "toyGEMset.Rda")

Session Information

sessionInfo()

References

destiny

Philipp Angerer et al. (2015): destiny: diffusion maps for large-scale single-cell data in R. Helmholtz-Zentrum München. URL: http://bioinformatics.oxfordjournals.org/content/32/8/1241

destiny package URL: https://bioconductor.org/packages/release/bioc/html/destiny.html

monocle

Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS and Rinn JL (2014). “The dynamics and regulators of cell fate decisions are revealed by pseudo-temporal ordering of single cells.” Nature Biotechnology.

monocle package URL: https://bioconductor.org/packages/release/bioc/html/monocle.html

sincell

Juliá M, Telenti A, Rausell A (2014): Sincell: R package for the statistical assessment of cell state hierarchies from single-cell RNA-seq data. bioRxiv preprint

sincell package URL: https://bioconductor.org/packages/release/bioc/html/sincell.html

TSCAN

Zhicheng Ji and Hongkai Ji (2015). TSCAN: TSCAN: Tools for Single-Cell ANalysis. R package version 1.12.0.

TSCAN package URL: https://bioconductor.org/packages/release/bioc/html/TSCAN.html

Any scripts or data that you put into this service are public.

ctgGEM documentation built on Nov. 8, 2020, 5:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ctgGEM
Generating Tree Hierarchy Visualizations from Gene Expression Data

Using ctgGEM
In ctgGEM: Generating Tree Hierarchy Visualizations from Gene Expression Data

Introduction

Installing `r Rpackage("ctgGEM")`

Preparing a `ctgGEMset` object

Loading the Data

Constructing a `ctgGEMset` object

Additional Info for the "monocle" method

Additional Info for the "TSCAN" method

Additional Info for the "sincell" method

Additional Info for the "destiny" method

Using `r Rpackage("ctgGEM")`

Using the "destiny" method

Using the "monocle" method

Using the "sincell" method

Using the "TSCAN" method

Replotting Trees

Session Information

References

destiny

monocle

sincell

TSCAN

Try the ctgGEM package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ctgGEM Generating Tree Hierarchy Visualizations from Gene Expression Data

Using ctgGEM In ctgGEM: Generating Tree Hierarchy Visualizations from Gene Expression Data

Introduction

Installing r Rpackage("ctgGEM")

Preparing a ctgGEMset object

Loading the Data

Constructing a ctgGEMset object

Additional Info for the "monocle" method

Additional Info for the "TSCAN" method

Additional Info for the "sincell" method

Additional Info for the "destiny" method

Using r Rpackage("ctgGEM")

Using the "destiny" method

Using the "monocle" method

Using the "sincell" method

Using the "TSCAN" method

Replotting Trees

Session Information

References

destiny

monocle

sincell

TSCAN

Try the ctgGEM package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ctgGEM
Generating Tree Hierarchy Visualizations from Gene Expression Data

Using ctgGEM
In ctgGEM: Generating Tree Hierarchy Visualizations from Gene Expression Data

Installing `r Rpackage("ctgGEM")`

Preparing a `ctgGEMset` object

Constructing a `ctgGEMset` object

Using `r Rpackage("ctgGEM")`