subsampleClustering | R Documentation |
Given input data, this function will subsample the samples, cluster the
subsamples, and return a n x n
matrix with the probability of
co-occurance.
## S4 method for signature 'character'
subsampleClustering(clusterFunction, ...)
## S4 method for signature 'ClusterFunction'
subsampleClustering(
clusterFunction,
inputMatrix,
inputType,
clusterArgs = NULL,
classifyMethod = c("All", "InSample", "OutOfSample"),
resamp.num = 100,
samp.p = 0.7,
ncores = 1,
warnings = TRUE,
...
)
clusterFunction |
a |
... |
arguments passed to mclapply (if ncores>1). |
inputMatrix |
numerical matrix on which to run the clustering or a
|
inputType |
a character vector defining what type of input is given in
the |
clusterArgs |
a list of parameter arguments to be passed to the function
defined in the |
classifyMethod |
method for determining which samples should be used in calculating the co-occurance matrix. "All"= all samples, "OutOfSample"= those not subsampled, and "InSample"=those in the subsample. See details for explanation. |
resamp.num |
the number of subsamples to draw. |
samp.p |
the proportion of samples to sample for each subsample. |
ncores |
integer giving the number of cores. If ncores>1, mclapply will be called. |
warnings |
logical as to whether should give warning if arguments given that don't match clustering choices given. Otherwise, inapplicable arguments will be ignored without warning. |
subsampleClustering
is not usually called directly by the
user. It is only an exported function so as to be able to clearly document
the arguments for subsampleClustering
which can be passed via the
argument subsampleArgs
in functions like clusterSingle
and clusterMany
.
requiredArgs:
The choice of "All" or "OutOfSample" for
requiredArgs
require the classification of arbitrary samples not
originally in the clustering to clusters; this is done via the classifyFUN
provided in the ClusterFunction
object. If the
ClusterFunction
object does not have such a function to
define how to classify into a cluster samples not in the subsample that
created the clustering then classifyMethod
must be
"InSample"
. Note that if "All" is chosen, all samples will be
classified into clusters via the classifyFUN, not just those that are
out-of-sample; this could result in different assignments to clusters for
the in-sample samples than their original assignment by the clustering
depending on the classification function. If you do not choose 'All',it is
possible to get NAs in resulting S matrix (particularly if when not enough
subsamples are taken) which can cause errors if you then pass the resulting
D=1-S matrix to mainClustering
. For this reason the default is
"All".
A n x n
matrix of co-occurances, i.e. a symmetric matrix with
[i,j] entries equal to the percentage of subsamples where the ith and jth
sample were clustered into the same cluster. The percentage is only out of
those subsamples where the ith and jth samples were both assigned to a
clustering. If classifyMethod=="All"
, this is all subsamples for all
i,j pairs. But if classifyMethod=="InSample"
or
classifyMethod=="OutOfSample"
, then the percentage is only taken on
those subsamples where the ith and jth sample were both in or out of
sample, respectively, relative to the subsample.
## Not run:
#takes a bit of time, not run on checks:
data(simData)
coOccur <- subsampleClustering( inputMatrix=simData, inputType="X",
clusterFunction="kmeans",
clusterArgs=list(k=3,nstart=10), resamp.n=100, samp.p=0.7)
#visualize the resulting co-occurance matrix
plotHeatmap(coOccur)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.