knitr::opts_chunk$set( warning = FALSE, message = FALSE, tidy = TRUE, cache=TRUE ) options(timeout = max(1000, getOption("timeout")))
In this article, we will show how to run IDclust with your own custom functions. The two 'hackable' part of IDclust are the processing and the differential functions. There are a few default functions present in the package, however based on your preferences you are able to customize both step quite easily.
As an example, we will analyse a scRNA dataset with 'monocle3' instead of the default Seurat for preprocessing as well as differential analysis.
library(IDclust) library(Seurat) library(monocle3)
Take example scRNA data from "Joint profiling of histone modifications and transcriptome in single cells from mouse brain,Chenxu Zhu, Yanxiao Zhang, Yang Eric Li, Jacinta Lucero, . Margarita Behrens,
Bing Ren, Nature Methods, 2021 Paired-Tag".
We load the data and transform it into a cellDataset to analyze it
with 'monocle3'.
set.seed(47) data(Seu, package = "IDclust")
At each step of the 'iterative_differential_clustering' function, a cluster is re-processed. This means that the counts are re-normalized and that a new dimensionality reduction is calculated. Today we want to replace the default function by the processing used in 'monocle3' package.
We start by studying the 'preprocess_Seurat' function. It takes in
entry 3 arguments:
'Seurat' - A Seurat object
'n_dims' - An integer specifying the number of first dimensions to keep in the dimensionality reduction step.
* 'dim_red' - The name of the slot to save the dimensionality reduction at each step in the reducedDimNames(scExp).
It returns a Seurat object with the cell dimensionality reduction (embedding) stored in the 'dim_red' slot of the reducedDimNames of the object.
We therefore define a function that process the data using 'monocle3' package and matches the criteria describe above. We will have to convert the object to a 'CellDataset' object and back to a Seurat object.
# From monocle3 tutorial - https://cole-trapnell-lab.github.io/monocle3/docs/clustering/ processing_monocle3 <- function(Seu, n_dims = 100, dim_red = "PCA"){ # Transform the Seurat object into CDS gene_metadata = Seu@assays$RNA@meta.features gene_metadata$gene = rownames(gene_metadata) gene_metadata$gene_short_name = rownames(gene_metadata) cds = monocle3::new_cell_data_set( expression_data = Seu@assays$RNA@counts, cell_metadata = Seu@meta.data, gene_metadata = gene_metadata ) # Process the cds cds <- monocle3::preprocess_cds(cds, num_dim = n_dims) cds <- monocle3::reduce_dimension(cds, preprocess_method = dim_red, reduction_method = dim_red) # Transform back into a Seurat object and keep the metadata Seu = Seurat::CreateSeuratObject( counts(cds), meta.data = Seu@meta.data, assay = "RNA" ) Seurat::Idents(Seu) = Seu$seurat_clusters # Add the reduced dimension Seu@reductions[[dim_red]] = Seurat::CreateDimReducObject( embeddings = reducedDim(cds, dim_red), key = paste0(dim_red, "_"), assay = "RNA" ) # Return the Seurat object return(Seu) }
We can now test the function, make sure that the object returned is a Seurat object and that the embedding is present in the object.
Seu_monocle = processing_monocle3(Seu, n_dims = 100, dim_red = "PCA") class(Seu_monocle) Seurat::DimPlot(Seu_monocle) Seurat::DimPlot(Seu, reduction = "pca")
We now define the differential function that will use 'top_markers' function from 'monocle3' package.
We start by studying the 'differential_Seurat' function. It takes in entry 3 mandatory arguments:
The function returns a data.frame containing the marker genes for each cluster. It contains a 'cluster' column, a 'gene' column.
differential_monocle3 <- function(Seu, by = "IDcluster", logFC.th = log2(1.5), qval.th = 0.01){ # Transform the Seurat object into CDS gene_metadata = Seu@assays$RNA@meta.features gene_metadata$gene = rownames(gene_metadata) gene_metadata$gene_short_name = rownames(gene_metadata) cds = monocle3::new_cell_data_set( expression_data = Seu@assays$RNA@counts, cell_metadata = Seu@meta.data, gene_metadata = gene_metadata ) # Differential analysis the cds res <- monocle3::top_markers( cds, group_cells_by = by, genes_to_test_per_group = 200, verbose = FALSE ) # Rename the column concerning gene and cluster colnames(res)[1] = "gene" colnames(res)[3] = "cluster" # We just use the adjusted p.value threshold res = res[which( res$marker_test_q_value < qval.th ), ] # Return the Seurat object return(res) }
We can now test the function, make sure that the object returned is a data.frame containing the marker genes of each cluster as well as the cluster column.
res = differential_monocle3(Seu, by = "seurat_clusters", qval.th = 0.01)
head(res)
We can finally use our two function to run the 'iterative_differential_clustering' function. To do so, we pass our monocle3 customized functions to preprocessing_function and differential_function, and we make sure the dim_red and n_dims arguments are correctly set.
We first pre-process our object with our custom function :
data("Seu") Seu_monocle = processing_monocle3(Seu, n_dims = 100, dim_red = "PCA")
set.seed(47) output_dir = "~/Tests/IDC_monocle/" if(!dir.exists(output_dir)) dir.create(output_dir) Seu_monocle = iterative_differential_clustering( Seu_monocle, output_dir = output_dir, plotting = FALSE, saving = TRUE, n_dims = 50, dim_red = "PCA", processing_function = processing_monocle3, differential_function = differential_monocle3, logFC.th = log2(1.5), qval.th = 0.01 )
A 'IDcluster' column was added to the object, which we can retrieve and add to the original object. We can now plot the clusters found this way on the UMAP:
Seu$IDcluster = Seu_monocle$IDcluster Seurat::DimPlot(Seu, reduction = "umap", group.by = "IDcluster")
We can also plot the cluster network using the summary and tghe
IDC_summary = qs::qread(file.path(output_dir, "IDC_summary.qs")) plot_cluster_network(Seu, IDC_summary = IDC_summary, color_by = "IDcluster", node_size_factor = 4, legend = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.