ExecuteCC: Execute Consensus Clustering

Description Usage Arguments Details Value References See Also Examples

View source: R/ClusteringMethod.R

Description

This function is based on the R package "ConsensusClusterPlus". We write a shell to unify the input and output format. It is helpful for the standardized flow of cancer subtypes analysis and validation. The parameters are compatible to the original R package "ConsensusClusterPlus" function "ConsensusClusterPlus()".
Please note: we add a new parameter "clusterNum" which represents the result with cancer subtypes group we want to return.

Usage

1
2
3
4
5
ExecuteCC(clusterNum, d, maxK = 10, clusterAlg = "hc",
  distance = "pearson", title = "ConsensusClusterResult", reps = 500,
  pItem = 0.8, pFeature = 1, plot = "png", innerLinkage = "average",
  finalLinkage = "average", writeTable = FALSE, weightsItem = NULL,
  weightsFeature = NULL, verbose = FALSE, corUse = "everything")

Arguments

clusterNum

A integer representing the return cluster number, this value should be less than maxClusterNum(maxK). This is the only additional parameter in our function compared to the original R package "ConsensusClusterPlus". All the other parameters are compatible to the function "ConsensusClusterPlus().

d

data to be clustered; either a data matrix where columns=items/samples and rows are features. For example, a gene expression matrix of genes in rows and microarrays in columns, or ExpressionSet object, or a distance object (only for cases of no feature resampling)

Please Note: We add a new data type (list) for this parameter. Please see details and examples.

maxK

integer value. maximum cluster number for Consensus Clustering Algorithm to evaluate.

clusterAlg

character value. cluster algorithm. 'hc' heirarchical (hclust), 'pam' for paritioning around medoids, 'km' for k-means upon data matrix, 'kmdist' for k-means upon distance matrices (former km option), or a function that returns a clustering.

distance

character value. 'pearson': (1 - Pearson correlation), 'spearman' (1 - Spearman correlation), 'euclidean', 'binary', 'maximum', 'canberra', 'minkowski" or custom distance function.

title

character value for output directory. This title can be an absolute or relative path

reps

integer value. number of subsamples(in other words, The iteration number of each cluster number)

pItem

Please refer to the "ConsensusClusterPlus" package for detailed information.

pFeature

Please refer to the "ConsensusClusterPlus" package for detailed information.

plot

Please refer to the "ConsensusClusterPlus" package for detailed information.

innerLinkage

Please refer to the "ConsensusClusterPlus" package for detailed information.

finalLinkage

Please refer to the "ConsensusClusterPlus" package for detailed information.

writeTable

Please refer to the "ConsensusClusterPlus" package for detailed information.

weightsItem

Please refer to the "ConsensusClusterPlus" package for detailed information.

weightsFeature

Please refer to the "ConsensusClusterPlus" package for detailed information.

verbose

Please refer to the "ConsensusClusterPlus" package for detailed information.

corUse

Please refer to the "ConsensusClusterPlus" package for detailed information.

Details

If the data is a list containing the matched mutli-genomics data matrices like the input as "ExecuteiCluster()" and "ExecuteSNF()", we use "z-score" to normalize features for each data matrix first. Then all the normalized data matrices from the data list are concatenated according to samples. The concatenated data matrix is the samples with a long features (all features in the data list). Our purpose is to make convenient comparing the different method with same dataset format. See examples.

Value

A list with the following elements.

References

Monti, S., Tamayo, P., Mesirov, J., Golub, T. (2003) Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52, 91-118.

See Also

ConsensusClusterPlus

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
### The input dataset is a single gene expression matrix.
data(GeneExp)
data(miRNAExp)
result1=ExecuteCC(clusterNum=3,d=GeneExp,maxK=10,clusterAlg="hc",distance="pearson",title="GBM")
result1$group

### The input dataset is multi-genomics data as a list
GBM=list(GeneExp=GeneExp,miRNAExp=miRNAExp)
result2=ExecuteCC(clusterNum=3,d=GBM,maxK=5,clusterAlg="hc",distance="pearson",title="GBM")
result2$group

CancerSubtypes documentation built on Nov. 8, 2020, 8:24 p.m.