This vignette provides quick guide for using the Wind package to compute weighted normalized mutual information (wNMI) and weighted Rand index (wRI) to evaluate the clustering results by comparing a clustering output with a reference which has a hierarchical structure.
The motivating example here is from single cell RNA sequencing (scRNA-seq). But the metrcs can be applied to any situation when the true class labels in the reference has a hierarchical structure. For examples, the subjects being clustered could be animals, plants, movies with reference labels as breed, species/cultivar, genre.
Cell clustering is one of the most common practice and routinely performed in scRNA-seq analysis. There are a number of clustering methods tailored specifically for scRNA-seq data. These methods usually partition the cells into several groups, with each group representing a cell type or subtype. To evaluate the performance of a clustering method, the common practice is to compare clustering result with reference labels, where the reference is obtained from another source with high confidence. The most widely used measures for the agreement between the clustering and the reference label are the Adjusted Rand Index (ARI) and the Normalized Mutual Information (NMI). These metrics are based on the assumption that the groups are completely exchangeable and overlook an important characteristic of single cell data: true cluster structure for a cell population is often hierarchical. Failing to take this true hierarchy into account in the evaluation of clustering results leads to assessments that do not accurately reflect the ability to group cells.
This package provides functionalities to compute two new metrics: weighted Rand index (wRI) and weighted mutual information (wMI), for the evaluation of scRNA-seq clustering results. The general idea is to obtain weights from cell type hierarchy, and use the weights in RI and MI calculation to reward/penalize the correct/incorrect classification.
Computation of wRI and wNMI requires following inputs:
Example below uses Y for expression matrix, trueclass for ground truth, and clusterRes for clustering result.
Computation of wNMI is done in two steps: first construct cell hierarchy, and then compute wNMI.
ctStruct = createRef(Y, trueclass) this_wNMI = wNMI(ctStruct, trueclass, clusterRes)
Computation of wRI is also done in two steps: first compute weights, and then compute wRI
weights = createWeights(Y, trueclass) this_wRI = wRI(trueclass, clusterRes)
We first load in an example dataset distributed with the package. The data was was generated by the 10x Genomics GemCode protocol to profile the transcriptome of eight pre-sorted cell types (B-cells, naive cytotoxic T-cells, CD14 monocytes, regula- tory T-cells, CD56 NK cells, memory T-cells, CD4 T-helper cells and naive T-cells) in peripheral blood mononuclear cells (PBMC). The original data contains more than 3000 cells. We randomly sampled 500 cells from the orginal data and use that for demonstration.
The dataset contains:
In this example, we want to evaluate the clustering results for five methods, and compare the evaluations from the weighted and traditional unweighted NMI and RI.
library(Wind) data(Zhengmix8eq)
ctStruct = createRef(Y, trueclass) plot(ctStruct$hc, xlab="", axes=FALSE, ylab="", ann=FALSE)
methods = names(clusterRes) allNMI = matrix(0, nrow=length(methods), ncol=2) rownames(allNMI) = methods colnames(allNMI) = c("NMI", "wNMI") for(i in 1:length(clusterRes)) { allNMI[i,1] = wNMI(ctStruct, trueclass, clusterRes[[i]], FALSE) allNMI[i,2] = wNMI(ctStruct, trueclass, clusterRes[[i]]) } barplot(t(allNMI), beside=TRUE, ylim=c(0.4,1.05), legend.text=TRUE, xpd=FALSE)
weights = createWeights(Y, trueclass)
allRI = matrix(0, nrow=length(methods), ncol=6) rownames(allRI) = methods colnames(allRI) = c("RI", "NI1","NI2","wRI","wNI1","wNI2") for(i in 1:length(clusterRes)) { allRI[i,1:3] = wRI(trueclass, clusterRes[[i]]) [1:3] allRI[i,4:6] = wRI(trueclass, clusterRes[[i]], weights$W0, weights$W1)[1:3] } barplot(t(allRI[,c(1,4)]), beside=TRUE, ylim=c(0.7,1.05), legend.text=TRUE, xpd=FALSE)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.