knitr::opts_chunk$set( collapse = TRUE, comment = "#>", # dev.args = list(png = list(type = "cairo")), dpi = 900, out.width = "100%", message = FALSE, warning = FALSE ) library(xfun) format_table <- function(mydf) { mydf %>% kableExtra::kbl() %>% kable_paper() %>% scroll_box(width = "800px", height = "200px") }
When joint analysis of two or more single cell sequencing data is to be performed, we need to first integrate the individual datasets. In seuratTools, a single function integration_workflow
is capable of integrating multiple Seurat objects provided as a list.
First step in this process is to load all the packages required
library(seuratTools) library(Seurat) library(tidyverse) library(ggraph) library(patchwork)
Here, we are using the panc8
dataset, which is produced in two batches using different technologies. The SplitObject()
function splits the seurat object into a list containing each batch as element. This list of seurat objects can then be integrated using the integration_workflow()
function, which identifies shared cell states that are present across different single cell datasets.
batches <- human_gene_transcript_seu[, !is.na(human_gene_transcript_seu$Prep.Method)] %>% Seurat::SplitObject(split.by = "Prep.Method") %>% identity() integrated_seu <- integration_workflow(batches)
When analyzing data across multiple batches, batch effects can become problematic and compromise integration and interpretation of the data. For instance, the dataset human_gene_transcript_seu
contains data obtained following two different preparation method. When this data is analyzed without integration we notice that in the Zhong method cluster 2 is barely represented. from the plot of marker features we know that cluster 2 has high expression of features associated with cones. this absence of cluster 2 in in Zhong method can become problematic during downstream analysis
markerp <- plot_markers(human_gene_transcript_seu, metavar = "gene_snn_res.0.2", num_markers = 10) dp <- DimPlot(human_gene_transcript_seu, group.by = "gene_snn_res.0.2", split.by = "Prep.Method") library(patchwork) wrap_plots(markerp, dp) + plot_layout(widths = c(1, 6), ncol = 1)
The integration_workflow()
function can integrate datasets obtained from mouse or human source. When integrating datasets obtained from the same species, the data is processed according to standard integration workflow.
integration_workflow()
can also be used to integrate datasets collected from different species, i.e., human and mouse. In this case, the sub-function cross_species_integrate()
first converts the mouse seurat objects to human by transforming features in the mouse dataset into human specific features and then integrates the two objects using standard integration workflow.
The standard integration procedure in integration_workflow()
includes multiple sub-functions responsible for performing different steps of integration.
Prior to integration, seurat_preprocess
performs standard preprocessing (log-normalization) and identifies variable features individually for each element in the list. Then, anchors are identified between the individual datasets using FindIntegrationAnchors
which is then used by IntegrateData
to return a Seurat object containing a new Assay, which holds an integrated expression matrix for all cells.
Downstream analysis is performed on this integrated matrix by seurat_integration_pipeline
sub-function which scales the integrated data, reduces the dimensions of seurat object, performs clustering at different resolutions and identifies marker genes.
Finally, the resulting integrated datasets can be visualized by plotting the output
DimPlot(integrated_seu, reduction = "umap", group.by = "Prep.Method", )
Prior to integration, the Seurat object must be pre-processed using the object_preprocess
function. This function handles normalization and scaling required for downstream data analysis.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.