knitr::opts_chunk$set(warning = FALSE)
singleCellTK integrates functions from Seurat [@Butler2018][@Stuart2019][@Satija2017][@Hao2021] package in an easy to use streamlined workflow using both the shiny user interface as well as the R console. The shiny application contains a separate tab that lets the users run the steps of the workflow in a sequential manner with ability to visualize through interactive plots from within the application. On the R console, the toolkit offers wrapper functions that use the SingleCellExperiment [@Amezquita2020] object as the input and the output. All computations from the wrapper functions are stored within this object for further manipulation.
To view detailed instructions on how to use the workflow, please select 'Interactive Analysis' for using the workflow in shiny application or 'Console Analysis' for using these methods on R console from the tabs below:
````{=html}
## Workflow Guide ````{=html} <div class="tab"> <button class="tablinks" onclick="openTab(event, 'interactive')" id="ia-button">Interactive Analysis</button> <button class="tablinks" onclick="openTab(event, 'console')" id="console-button">Console Analysis</button> </div> <div id="interactive" class="tabcontent">
In this tutorial example, we illustrate all the steps of the curated workflow and focus on the options available to manipulate and customize the steps of the workflow as per user requirements. This tutorial takes the real-world scRNAseq dataset as an example, which consists of 2,700 Peripheral Blood Mononuclear Cells (PBMCs) collected from a healthy donor, namingly PBMC3K. This dataset is available from 10X Genomics and can be found on the 10X website. To initiate the Seurat
workflow, click on the 'Curated Workflows' from the top menu and select Seurat
:
NOTE: Before heading into the next steps, we assume that users have already loaded SCTK, imported and QC'ed the PBMC3K data, following the Import and QC Tutorial.
1. Normalize Data
Assuming that the data has been uploaded via the Upload tab of the toolkit, the first step for the analysis of the data is the Normalization of data. For this purpose, any assay available in the uploaded data can be used against one of the three methods of normalization available through Seurat
i.e. LogNormalize
, CLR
(Centered Log Ratio) or RC
(Relative Counts).
assay
to normalize from the dropdown menu.LogNormalize
, CLR
or RC
.10000
.2. Highly Variable Genes
Identification of the highly variable genes is core to the Seurat
workflow and these highly variable genes are used throughout the remaining workflow. Seurat
provides three methods for variable genes identification i.e. vst
(uses local polynomial regression to fit a relationship between log of variance and log of mean), mean.var.plot
(uses mean and dispersion to divide features into bins) and dispersion
(uses highest dispersion values only).
vst
, mean.var.plot
and dispersion
.2000
.3. Dimensionality Reduction
Seurat
workflow offers PCA
or ICA
for dimensionality reduction and the components from these methods can be used in the downstream analysis. Moreover, several plots are available for the user to inspect the output of the dimensionality reduction such as the standard 'PCA Plot', 'Elbow Plot', 'Jackstraw Plot' and 'Heatmap Plot'.
50
.TRUE
.4. tSNE/UMAP
'tSNE' and 'UMAP' can be computed and plotted once components are available from 'Dimensionality Reduction' tab.
5. Clustering
Cluster labels can be generated for all cells/samples using one of the computed reduction method. Plots are automatically re-computed with cluster labels. The available algorithms for clustering as provided by Seurat
include original Louvain algorithm
, Louvain algorithm with multilevel refinement
and SLM algorithm
.
original Louvain algorithm
, Louvain algorithm with multilevel refinement
and SLM algorithm
0.8
.TRUE
.6. Find Markers
'Find Markers' tab can be used to identify and visualize the marker genes using on of the provided visualization methods. The tab offers identification of markers between two selected phenotype groups or between all groups and can be decided at the time of the computation. Furthermore, markers that are conserved between two phenotype groups can also be identified. Visualizations such as Ridge Plot, Violin Plot, Feature Plot and Heatmap Plot can be used to visualize the individual marker genes.
1. Select if you want to identify marker genes against all groups in a biological variable or between two pre-defined groups. Additionally, users can select the last option to identify the marker genes that are conserved between two groups.
2. Select phenotype variable that contains the grouping information.
3. Select test used for marker genes identification.
4. Select if only positive markers should be returned.
5. Press "Find Markers" button to run marker identification.
6. Identified marker genes are populated in the table.
7. Filters can be applied on the table.
8. Filters allow different comparisons based on the type of the column of the table.
9. Table re-populated after applying filters.
10. Heatmap plot can be visualized for all genes populated in the table (9) against all biological groups in the selected phenotype variable.
11. To visualize each individual marker gene through gene plots, they can be selected by clicking on the relevant rows of the table.
12. Selected marker genes from the table are plotted with gene plots.
7. Downstream Analysis
Once all steps of the Seurat workflow are completed, users can further analyze the data by directly going to the various downstream analysis options (Differential Expression, Marker Selection & Pathway Analysis) from within the Seurat workflow.
````{=html}
All methods provided by SCTK for Seurat workflow use a `SingleCellExperiment` object both as an input and output. Using a sample dataset: ```r library(singleCellTK) # Load filtered pbmc3k data sce <- readRDS("tutorial_pbmc3k_qc.rds") print(sce) ``` **1. Normalize Data** <br> Once raw data is uploaded and stored in a `SingleCellExperiment` object, `runSeuratNormalizeData()` function can be used to normalize the data. The method returns a `SingleCellExperiment` object with normalized data stored as a new assay in the input object. Parameters to this function include `useAssay` (specify the assay that should be normalized), `normAssayName` (specify the new name of the normalized assay, defaults to `"seuratNormData"`), `normalizationMethod` (specify the normalization method to use, defaults to `"LogNormalize"`) and `scaleFactor` (defaults to `10000`). ```r sce <- runSeuratNormalizeData(inSCE = sce, useAssay = "decontXcounts", normAssayName = "seuratNormData", normalizationMethod = "LogNormalize", scaleFactor = 10000) ``` **2. Highly Variable Genes** <br> Highly variable genes can be identified by first using the `runSeuratFindHVG()` function that computes that statistics against a selected HVG method in the rowData of input object. The genes can be identified by using the `getTopHVG()` function. The variable genes can be visualized using the `plotSeuratHVG()` method. <br> Parameters for `runSeuratFindHVG()` include `useAssay` (specify the name of the assay to use), `method` (specify the method to use for variable genes computation, defaults to `"vst"`), `hvgNumber` (number of variable features to identify, defaults) and `createFeatureSubset` (define a name for the subset of selected top variable features). ```r sce <- runSeuratFindHVG(inSCE = sce, useAssay = "decontXcounts", method = "vst", hvgNumber = 2000, createFeatureSubset = "hvf") # Print names of top 10 variable features print(getTopHVG(inSCE = sce, method = "vst", hvgNumber = 10)) # Plot variable features with top 10 labeled plotSeuratHVG(sce, labelPoints = 10) ``` **3. Dimensionality Reduction** <br> PCA or ICA can be computed using the `runSeuratPCA()` and `runSeuratICA()` functions respectively. Plots can be visualized using `plotSeuratReduction()`, `plotSeuratElbow()`, `plotSeuratJackStraw()` (must previously be computed by `runSeuratJackStraw()`) and `runSeuratHeatmap()`. ```r sce <- runSeuratPCA(inSCE = sce, useAssay = "seuratNormData", reducedDimName = "pca", nPCs = 50, seed = 42, scale = TRUE, useFeatureSubset = "hvf") # Plot PC1 vs PC2 plot plotSeuratReduction(inSCE = sce, useReduction = "pca") # Plot Elbowplot plotSeuratElbow(inSCE = sce) # Compute JackStraw sce <- runSeuratJackStraw(inSCE = sce, useAssay = "seuratNormData", dims = 50) # Plot JackStraw plotSeuratJackStraw(inSCE = sce, dims = 50) # Compute and plot first 4 dimensions and 30 top features in a heatmap runSeuratHeatmap(inSCE = sce, useAssay = "seuratNormData", useReduction = "pca", nfeatures = 30, dims = 4) ``` **4. tSNE/UMAP** <br> `runSeuratTSNE()` and `runSeuratUMAP()` can be used to compute tSNE/UMAP statistics and store into the input object. Parameters to both functions include `inSCE` (input SCE object), `useReduction` (specify the reduction to use i.e. `"pca"` or `"ica"`), `reducedDimName` (name of this new reduction) and `dims` (number of dims to use). Additionally, method specific parameters can be used to fine tune the algorithm. `plotSeuratReduction()` can be used to visualize the results. ```r sce <- runSeuratTSNE(inSCE = sce, useReduction = "pca", reducedDimName = "seuratTSNE", dims = 10, perplexity = 30, seed = 1) # Plot TSNE plotSeuratReduction(sce, "tsne") ``` ```r sce <- runSeuratUMAP(inSCE = sce, useReduction = "pca", reducedDimName = "seuratUMAP", dims = 10, minDist = 0.3, nNeighbors = 30, spread = 1, seed = 42) ``` ```r # Plot UMAP plotSeuratReduction(sce, "umap") ``` **5. Clustering** <br> `runSeuratFindClusters()` function can be used to compute the clusters, which can later be plotted through the `plotSeuratReduction()` method with cluster labels. The parameters to the function include `inSCE` (input SCE object), `useAssay` (name of the assay if no reduction to be used), `useReduction` (specify which reduction to use i.e. `"pca"` or `"ica"`), `dims` (number of dims to use), the algorithm (either `"louvain"`, `"multilevel"` or `"SLM"`) and `resolution` (defaults to 0.8). ```r sce <- runSeuratFindClusters(inSCE = sce, useReduction = "pca", resolution = 0.8, algorithm = "louvain", dims = 10) ``` `plotSeuratReduction()` can then be used to plot all reductions previously computed with cluster labels: ```r plotSeuratReduction(sce, "pca", showLegend = TRUE) plotSeuratReduction(sce, "tsne", showLegend = TRUE) plotSeuratReduction(sce, "umap", showLegend = TRUE) ``` **6. Find Markers** <br> Marker genes can be identified using the `runSeuratFindMarkers()` function. This function can either use one specified column from `colData` of the input object as a group variable if all groups from that variable are to be used (`allGroup` parameter) or users can manually specify the cells included in one group vs cells included in the second group (`cells1` and `cells2` parameter). ```r sce <- runSeuratFindMarkers(inSCE = sce, allGroup = "Seurat_louvain_Resolution0.8") # Fetch marker genes table markerGenes <- metadata(sce)[["seuratMarkers"]] # Order by log fold change and p value markerGenes <- markerGenes[order(-markerGenes$avg_log2FC, markerGenes$p_val),] head(markerGenes) ``` The marker genes identified can be visualized through one of the available plots from `ridge plot`, `violin plot`, `feature plot`, `dot plot` and `heatmap plot`. All marker genes visualizations can be plotted through the wrapper function `plotSeuratGenes()`, which must be supplied the SCE object (markers previously computed), name of the scaled assay, type of the plot (available options are `"ridge"`, `"feature"`, `"violin"`, `"dot"` and `"heatmap"`), features that should be plotted (`character` vector) and the grouping variable that is available in the `colData` slot of the input object. An additional parameter `ncol` decides in how many columns should the visualizations be plotted. ```r plotSeuratGenes(inSCE = sce, useAssay = "seuratNormData", plotType = "ridge", features = metadata(sce)[["seuratMarkers"]]$gene.id[1:4], groupVariable = "Seurat_louvain_Resolution0.8", ncol = 2, combine = TRUE) ``` ````{=html} </div> <script> document.getElementById("ia-button").click(); </script> </body>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.