makeConsensus: Find sets of samples that stay together across clusterings

makeConsensusR Documentation

Find sets of samples that stay together across clusterings

Description

Find sets of samples that stay together across clusterings in order to define a new clustering vector.

Usage

## S4 method for signature 'matrix'
makeConsensus(
  x,
  proportion,
  clusterFunction = "hierarchical01",
  minSize = 5,
  propUnassigned = 0.5,
  whenUnassign = c("before", "after"),
  clusterArgs = NULL
)

## S4 method for signature 'ClusterExperiment'
makeConsensus(
  x,
  whichClusters,
  eraseOld = FALSE,
  clusterLabel = "makeConsensus",
  ...
)

Arguments

x

a matrix with samples on the rows and different clusterings on the columns or ClusterExperiment object.

proportion

The proportion of times that two sets of samples should be together in order to be grouped into a cluster (if <1, passed to mainClustering via alpha = 1 - proportion)

clusterFunction

the clustering function to use (passed to mainClustering); currently must be of type '01' and accept as input matrices of type "cat" (see details of ?ClusterFunction).

minSize

minimum size required for a set of samples to be considered in a cluster because of shared clustering, passed to mainClustering

propUnassigned

samples with greater than this proportion of assignments equal to '-1' are assigned a '-1' cluster value as a last step (only if proportion < 1)

whenUnassign

(provided for back compatibility with previous versions). Must be one of "before" or "after", indicating at what point are samples with a proportion of assignments of -1 greater than propUnassigned forced to have a '-1' value. If "before", then these samples are removed and not used for clustering. If "after", these samples are included in the clustering step, but then the cluster values they receive are assigned a '-1. These choices may result in different clusterings, because if these samples are included in the clustering (i.e. whenUnassign="after", then these samples may affect the cluster assignments of other samples. The default is currently "before", but previous to version 2.5.4, there was no such option and the code internally set to "after", so for reproducibility with older results, users may need to set this option.

clusterArgs

list of arguments to be passed to the call to mainClustering that is used to cluster the proportion overlap between samples.

whichClusters

argument that can be either numeric or character vector indicating the clusterings to be used. See details of getClusterIndex.

eraseOld

logical. Only relevant if input x is of class ClusterExperiment. If TRUE, will erase existing workflow results (clusterMany as well as mergeClusters and makeConsensus). If FALSE, existing workflow results will have "_i" added to the clusterTypes value, where i is one more than the largest such existing workflow clusterTypes.

clusterLabel

a string used to describe the type of clustering. By default it is equal to "makeConsensus", to indicate that this clustering is the result of a call to makeConsensus. However, a more informative label can be set (see vignette).

...

arguments to be passed on to the method for signature matrix,missing.

Details

This function was previously called combineMany (versions <= 2.0.0). combineMany is still available, but is considered defunct and users should update their code accordingly.

The function tries to find a consensus cluster across many different clusterings of the same samples. It does so by creating a nSamples x nSamples matrix of the percentage of co-occurance of each sample and then calling mainClustering to cluster the co-occurance matrix. The function assumes that '-1' labels indicate clusters that are not assigned to a cluster. Co-occurance with the unassigned cluster is treated differently than other clusters. The percent co-occurance is taken only with respect to those clusterings where both samples were assigned. Then samples with more than propUnassigned values that are '-1' across all of the clusterings are assigned a '-1' regardless of their cluster assignment.

The method calls mainClustering on the proportion matrix with clusterFunction as the 01 clustering algorithm, alpha=1-proportion, minSize=minSize, and evalClusterMethod=c("average"). See help of mainClustering for more details.

Value

If x is a matrix, a list with values

  • clustering vector of cluster assignments, with "-1" implying unassigned

  • percentageShared a nSample x nSample matrix of the percent co-occurance across clusters used to find the final clusters. Percentage is out of those not '-1'

  • noUnassignedCorrection a vector of cluster assignments before samples were converted to '-1' because had >propUnassigned '-1' values (i.e. the direct output of the mainClustering output.)

If x is a ClusterExperiment, a ClusterExperiment object, with an added clustering of clusterTypes equal to makeConsensus and the percentageShared matrix stored in the coClustering slot.

Examples

## Not run: 
data(simData)

cl <- clusterMany(simData,nReducedDims=c(5,10,50),  reduceMethod="PCA",
clusterFunction="pam", ks=2:4, findBestK=c(FALSE), removeSil=TRUE,
makeMissingDiss=TRUE, subsample=FALSE)

#make names shorter for plotting
clMat <- clusterMatrix(cl)
colnames(clMat) <- gsub("TRUE", "T", colnames(clMat))
colnames(clMat) <- gsub("FALSE", "F", colnames(clMat))
colnames(clMat) <- gsub("k=NA,", "", colnames(clMat))

#require 100% agreement -- very strict
clCommon100 <- makeConsensus(clMat, proportion=1, minSize=10)

#require 70% agreement based on clustering of overlap
clCommon70 <- makeConsensus(clMat, proportion=0.7, minSize=10)

oldpar <- par(no.readonly = TRUE)
par(mar=c(1.1, 12.1, 1.1, 1.1))
plotClusters(cbind("70%Similarity"=clCommon70, clMat,
"100%Similarity"=clCommon100), axisLine=-2)

#method for ClusterExperiment object
clCommon <- makeConsensus(cl, whichClusters="workflow", proportion=0.7,
minSize=10)
plotClusters(clCommon)
par(oldpar)

## End(Not run)

epurdom/clusterExperiment documentation built on April 28, 2024, 8:17 p.m.