knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The cerebroApp
package has two main purposes: (1) Give access to the Cerebro user interface, and (2) provide a set of functions to pre-process and export scRNA-seq data for visualization in Cerebro.
The Cerebro user interface was built using the Shiny framework and designed to provide numerous perspectives on a given data set that should facilitate data interpretation.
This vignette will give you an example of how scRNA-seq data can be processed using the Seurat toolkit and then exported for visualization in Cerebro.
Ket features that can be accessed through Cerebro:
Before starting the workflow, we need to install cerebroApp
, as well as the Seurat
, monocle
and SingleR
packages, which are not installed as dependencies of cerebroApp because they are only necessary if you want/need to pre-process your scRNA-seq data.
If you just want to launch the Cerebro user interface, e.g. because you already have the pre-processed data, you don't need those packages.
if ( 'BiocManager' %in% installed.packages() == FALSE ) install.packages('BiocManager') library(BiocManager) if ( 'cerebroApp' %in% installed.packages() == FALSE ) install('romanhaa/cerebroApp') if ( 'Seurat' %in% installed.packages() == FALSE ) install('Seurat') if ( 'SingleR' %in% installed.packages() == FALSE ) install('SingleR') if ( 'monocle' %in% installed.packages() == FALSE ) install('monocle')
Apart from the packages we just installed, we will also load the dplyr package and set a seed to make our analysis reproducible.
library(dplyr) library(Seurat) library(SingleR) library(monocle) library(cerebroApp) options(width = 100) set.seed(1234567)
In this example workflow, we will load a small transcript count table from the Seurat package containing 80 cells and 230 genes. If you want to try the workflow with a real data set, you can find some transcript count matrices on the Cerebro GitHub repo.
pbmc <- read.table( file = system.file('extdata', 'pbmc_raw.txt', package = 'Seurat'), as.is = TRUE )
Now, we create a Seurat
object and filter out cells with less than 50
transcripts or fewer than 10
expressed genes.
Those cut-offs are only reasonable for this example data set and will likely need to be adjusted in a real data set.
Then, we follow the standard Seurat workflow, including...
seurat <- CreateSeuratObject( project = 'pbmc', counts = pbmc, min.cells = 5 ) seurat <- subset(seurat, subset = nCount_RNA >= 50 & nFeature_RNA >= 10) seurat <- NormalizeData(seurat, verbose = FALSE) seurat <- FindVariableFeatures(seurat, verbose = FALSE) seurat <- ScaleData(seurat, vars.to.regress = 'nCount_RNA', verbose = FALSE) seurat <- RunPCA(seurat, features = seurat@assays$RNA@var.features, verbose = FALSE)
Then, we identify clusters with a relatively low resolution, again, because of the small size of this data set.
seurat <- FindNeighbors(seurat) seurat <- FindClusters(seurat, resolution = 0.5) seurat@meta.data$RNA_snn_res.0.5 <- NULL
We could also perform cell cycle analysis using the CellCycleScoring()
built into Seurat.
However, too few of the cell cycle genes are present in this example data set so we will skip it here.
The code below should work with a real data set.
seurat <- CellCycleScoring( seurat, g2m.features = cc.genes$g2m.genes, s.features = cc.genes$s.genes ) seurat@misc$gene_lists$G2M_phase_genes <- cc.genes$g2m.genes seurat@misc$gene_lists$S_phase_genes <- cc.genes$s.genes
Next, we use the UMAP technique to generate two- and three-dimensional projections.
seurat <- RunUMAP( seurat, reduction.name = 'UMAP', reduction.key = 'UMAP_', dims = 1:30, n.components = 2, seed.use = 100, verbose = FALSE ) seurat <- RunUMAP( seurat, reduction.name = 'UMAP_3D', reduction.key = 'UMAP3D_', dims = 1:30, n.components = 3, seed.use = 100, verbose = FALSE )
This example data set consists of a single sample so we just add that name to the meta data.
Moreover, in order to be able to understand how we did the analysis later, we add some meta data to the misc
slot of our Seurat
object, e.g. an experiment name and the species.
The info we store there will later be collected for visualization in Cerebro.
seurat@meta.data$sample <- factor('pbmc', levels = 'pbmc') seurat@misc$experiment <- list( experiment_name = 'pbmc', organism = 'hg', date_of_analysis = Sys.Date() ) seurat@misc$parameters <- list( gene_nomenclature = 'gene_name', discard_genes_expressed_in_fewer_cells_than = 10, keep_mitochondrial_genes = TRUE, variables_to_regress_out = 'nUMI', number_PCs = 30, cluster_resolution = 0.5 ) seurat@misc$parameters$filtering <- list( UMI_min = 50, UMI_max = Inf, genes_min = 10, genes_max = Inf ) seurat@misc$technical_info$cerebroApp_version <- utils::packageVersion('cerebroApp') seurat@misc$technical_info$Seurat <- utils::packageVersion('Seurat') seurat@misc$technical_info <- list( 'R' = capture.output(devtools::session_info()) )
Using the SingleR
package, we can get a suggestion of cell type for each cell.
Apart from the suggested cell type, we also extract the score that might help us to understand how confident the assignment was.
singler_ref <- BlueprintEncodeData() singler_results_blueprintencode_main <- SingleR( test = GetAssayData(seurat, assay = 'RNA', slot = 'data'), ref = singler_ref, labels = singler_ref@colData@listData$label.main ) seurat@meta.data$cell_type_singler_blueprintencode_main <- singler_results_blueprintencode_main@listData$labels singler_scores <- singler_results_blueprintencode_main@listData$scores %>% as_tibble() %>% dplyr::mutate(assigned_score = NA) for ( i in seq_len(nrow(singler_scores)) ) { singler_scores$assigned_score[i] <- singler_scores[[singler_results_blueprintencode_main@listData$labels[i]]][i] } seurat@meta.data$cell_type_singler_blueprintencode_main_score <- singler_scores$assigned_score
Here, we calculate phylogenetic trees that represent the similarity between different subgroups, in this case clusters and cell types.
To do so, we need to set the identity in the Seurat object to the current grouping of interest, calculate the tree, and then copy the result to the misc
slot in the Seurat object so it can be exported later.
Idents(seurat) <- "seurat_clusters" seurat <- BuildClusterTree( seurat, dims = 1:30, reorder = FALSE, reorder.numeric = FALSE ) seurat@misc$trees$seurat_clusters <- seurat@tools$BuildClusterTree
Idents(seurat) <- "cell_type_singler_blueprintencode_main" seurat <- BuildClusterTree( seurat, dims = 1:30, reorder = FALSE, reorder.numeric = FALSE ) seurat@misc$trees$cell_type_singler_blueprintencode_main <- seurat@tools$BuildClusterTree
Now we use a set of cerebroApp function to get more information about the data set and the contained clusters and cell types.
First, for every cell we calculate the percentage of mitochondrial and ribosomal transcripts of all transcripts using the addPercentMtRibo()
function.
seurat <- addPercentMtRibo( seurat, organism = 'hg', gene_nomenclature = 'name' )
Another interesting perspective is to look at the 100 most expressed genes across all cells and for every cluster and cell type. This can help in understanding the transcriptional activity of a given cell population.
seurat <- getMostExpressedGenes( seurat, assay = 'RNA', groups = c('sample','seurat_clusters','cell_type_singler_blueprintencode_main') )
Marker genes are genes which are particularly strongly or weakly expressed in a given cell population, e.g. a cluster.
This information can help to distinguish the role of different cell groups in the data set.
We can identify marker genes using the getMarkerGenes()
function which internally uses the Seurat::FindAllMarkers()
function but also stores the results in the Seurat object under the misc
slot.
seurat <- getMarkerGenes( seurat, assay = 'RNA', organism = 'hg', groups = c('seurat_clusters','cell_type_singler_blueprintencode_main'), name = 'cerebro_seurat', only_pos = TRUE )
Using the previously identified marker genes, we can perform a pathway enrichmeny analysis using the getEnrichedPathways()
function.
The function will internally use the Enrichr API and store the results in the Seurat object under the misc
slot.
All we have to do is provide the name that we specified in getMarkerGenes()
.
seurat <- getEnrichedPathways( seurat, marker_genes_input = 'cerebro_seurat', adj_p_cutoff = 0.01, max_terms = 100 )
On top of querying the databases of the Enrichr service, one can also perform gene set enrichment analysis on gene sets provided as a .gmt
file, e.g. from the MSigDB.
The performGeneSetEnrichmentAnalysis()
function uses the GSVA method in combination with additional statistics as published by Diaz-Mejia et. al..
example_gene_set <- system.file("extdata/example_gene_set.gmt", package = "cerebroApp") seurat <- performGeneSetEnrichmentAnalysis( seurat, assay = 'RNA', GMT_file = example_gene_set, groups = c('seurat_clusters','cell_type_singler_blueprintencode_main') )
Then, we perform trajectory analysis with Monocle v2 using the previously identified highly variable genes.
We extract the trajectory from the generated Monocle object with the extractMonocleTrajectory()
function of cerebroApp and attach it to our Seurat object.
monocle <- newCellDataSet( seurat@assays$RNA@counts, phenoData = new('AnnotatedDataFrame', data = seurat@meta.data), featureData = new('AnnotatedDataFrame', data = data.frame( gene_short_name = rownames(seurat@assays$RNA@counts), row.names = rownames(seurat@assays$RNA@counts)) ) ) monocle <- estimateSizeFactors(monocle) monocle <- setOrderingFilter(monocle, seurat@assays$RNA@var.features) monocle <- reduceDimension(monocle, max_components = 2, method = 'DDRTree') monocle <- orderCells(monocle) seurat <- extractMonocleTrajectory(monocle, seurat, 'highly_variable_genes')
Finally, we use the exportFromSeurat()
function of cerebroApp to export our Seurat object to a .crb
file which can be loaded into Cerebro.
exportFromSeurat( seurat, assay = 'RNA', slot = 'data', file = paste0('cerebro_pbmc_seurat_', Sys.Date(), '.crb'), experiment_name = 'pbmc', organism = 'hg', groups = c('sample','seurat_clusters','cell_type_singler_blueprintencode_main'), nUMI = 'nCount_RNA', nGene = 'nFeature_RNA', add_all_meta_data = TRUE, verbose = FALSE )
The Cerebro use interface can be launched using the launchCerebro()
function.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.