Description Usage Arguments Details Value References See Also Examples
View source: R/ClusteringMethod.R
This function is based on the R package "ConsensusClusterPlus".
We write a shell to unify the input and output format.
It is helpful for the standardized flow of cancer subtypes analysis and validation.
The parameters are compatible to the original R package "ConsensusClusterPlus" function "ConsensusClusterPlus()".
Please note: we add a new parameter "clusterNum" which represents the result with cancer subtypes group we want to return.
1 2 3 4 5 | ExecuteCC(clusterNum, d, maxK = 10, clusterAlg = "hc",
distance = "pearson", title = "ConsensusClusterResult", reps = 500,
pItem = 0.8, pFeature = 1, plot = "png", innerLinkage = "average",
finalLinkage = "average", writeTable = FALSE, weightsItem = NULL,
weightsFeature = NULL, verbose = FALSE, corUse = "everything")
|
clusterNum |
A integer representing the return cluster number, this value should be less than maxClusterNum(maxK). This is the only additional parameter in our function compared to the original R package "ConsensusClusterPlus". All the other parameters are compatible to the function "ConsensusClusterPlus(). |
d |
data to be clustered; either a data matrix where columns=items/samples and rows are features. For example, a gene expression matrix of genes in rows and microarrays in columns, or ExpressionSet object, or a distance object (only for cases of no feature resampling) Please Note: We add a new data type (list) for this parameter. Please see details and examples. |
maxK |
integer value. maximum cluster number for Consensus Clustering Algorithm to evaluate. |
clusterAlg |
character value. cluster algorithm. 'hc' heirarchical (hclust), 'pam' for paritioning around medoids, 'km' for k-means upon data matrix, 'kmdist' for k-means upon distance matrices (former km option), or a function that returns a clustering. |
distance |
character value. 'pearson': (1 - Pearson correlation), 'spearman' (1 - Spearman correlation), 'euclidean', 'binary', 'maximum', 'canberra', 'minkowski" or custom distance function. |
title |
character value for output directory. This title can be an absolute or relative path |
reps |
integer value. number of subsamples(in other words, The iteration number of each cluster number) |
pItem |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
pFeature |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
plot |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
innerLinkage |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
finalLinkage |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
writeTable |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
weightsItem |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
weightsFeature |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
verbose |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
corUse |
Please refer to the "ConsensusClusterPlus" package for detailed information. |
If the data is a list containing the matched mutli-genomics data matrices like the input as "ExecuteiCluster()" and "ExecuteSNF()", we use "z-score" to normalize features for each data matrix first. Then all the normalized data matrices from the data list are concatenated according to samples. The concatenated data matrix is the samples with a long features (all features in the data list). Our purpose is to make convenient comparing the different method with same dataset format. See examples.
A list with the following elements.
group : A vector represent the group of cancer subtypes. The order is corresponding to the the samples in the data matrix.
This is the most important result for all clustering methods, so we place it as the first component. The format of group is consistent across different algorithms and therefore makes it convenient for downstream analyses. Moreover, the format of group is also compatible with the K-means result and the hclust (after using the cutree() function).
distanceMatrix : It is a sample similarity matrix. The more large value between samples in the matrix, the more similarity the samples are.
We extracted this matrix from the algorithmic procedure because it is useful for similarity analysis among the samples based on the clustering results.
originalResult : The clustering result of the original function "ConsensusClusterPlus()"
Different clustering algorithms have different output formats. Although we have the group component which has consistent format for all of the algorithms (making it easy for downstream analyses), we still keep the output from the original algorithms.
Monti, S., Tamayo, P., Mesirov, J., Golub, T. (2003) Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52, 91-118.
1 2 3 4 5 6 7 8 9 10 | ### The input dataset is a single gene expression matrix.
data(GeneExp)
data(miRNAExp)
result1=ExecuteCC(clusterNum=3,d=GeneExp,maxK=10,clusterAlg="hc",distance="pearson",title="GBM")
result1$group
### The input dataset is multi-genomics data as a list
GBM=list(GeneExp=GeneExp,miRNAExp=miRNAExp)
result2=ExecuteCC(clusterNum=3,d=GBM,maxK=5,clusterAlg="hc",distance="pearson",title="GBM")
result2$group
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.