In order to evaluate the result of clustering, we implemented a tool for analyzing the correlation of clusters obtained from two cluster systems. By default, the consensus molecular subtypes (CMSs) of Colorectal cancer (CRC) samples (Guinney, J., Dienstmann, R., 2015) as well as CrossICC-clustered subtypes with same data were used for comparison.
The input file should be csv format (comma-separated), with three columns:
| ID | Which cluster? (by method 1) | Which cluster? (by method 2) | |--------------------------|------------------------------|------------------------------| | XXXXXXX | C1 | K5 | | XXXXXXX | C7 | K8 | | XXXXXXX | C4 | K2 | | ... | ... | ... |
The comparison were evaluated by the following statistics:
a measure of the similarity between two data clusterings. A true positive (TP) decision assigns two similar documents to the same cluster, a true negative (TN) decision assigns two dissimilar documents to different clusters. There are two types of errors we can commit. A (FP) decision assigns two dissimilar documents to the same cluster. A (FN) decision assigns two similar documents to different clusters. The Rand index measures the percentage of decisions that are correct.
$$RI=((TP+TN))/((TP+FP+FN+TN))$$
The adjusted Rand index is the corrected-for-chance version of the Rand index
$$ARI=((RI-Expetced RI))/((maxa(RI)-Expected RI))$$
a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:
$$J(A,B)=(|A∩B|)/|A∪B| =(|A∩B|)/(|A|+|B|-|A∩B|)$$
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.