Description Usage Arguments Details Value Slots Examples
ClusterFunction
is a class for holding functions that can
be used for clustering in the clustering algorithms in this package.
The constructor ClusterFunction
creates an object of the
class ClusterFunction
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | internalFunctionCheck(clusterFUN, inputType, algorithmType, outputType)
ClusterFunction(clusterFUN, ...)
## S4 method for signature ''function''
ClusterFunction(
clusterFUN,
inputType,
outputType,
algorithmType,
inputClassifyType = NA_character_,
requiredArgs = NA_character_,
classifyFUN = NULL,
checkFunctions = TRUE
)
|
clusterFUN |
function passed to slot |
inputType |
character for slot |
algorithmType |
character for slot |
outputType |
character for slot |
... |
arguments passed to different methods of |
inputClassifyType |
character for slot |
requiredArgs |
character for slot |
classifyFUN |
function for slot |
checkFunctions |
logical for whether to check the input functions with
|
internalFunctionCheck
is the function that is called by the
validity check of the ClusterFunction
constructor (if
checkFunctions=TRUE
). It is available as an S3 function for the user
to be able to test their functions and debug them, which is difficult to do
with a S4 validity function.
clusterFUN: The following arguments are required to be accepted for
clusterFUN
– higher-level code may pass these arguments (but the
function can ignore them or just have be handled with a ... )
"inputMatrix"will be the matrix of data
"inputType"one of "X", "diss", or "cat". If
"X", then inputMatrix
is assumed to be nfeatures x nsamples (like
assay(CEObj) would give). If "cat" then nfeatures x nsamples, but all
entries should be categorical levels, encoded by positive
integers, with -1/-2 types of NA (like a clusterMatrix slot, but with
dimensions switched). If "diss", then inputMatrix
should be a nxn
dissimilarity matrix.
"checkArgs"logical argument. If
checkArgs=TRUE
, the clusterFUN
should check if the arguments
passed in ...
are valid and return an error if not; otherwise, no
error will be given, but the check should be done and only valid arguments
in ...
passed along. This is necessary for the function to work with
clusterMany
which passes all arguments to all functions without
checking.
"cluster.only"logical argument. If
cluster.only=TRUE
, then clusterFUN
should return only the
vector of cluster assignments (or list if outputType="list"
). If
cluster.only=FALSE
then the clusterFUN
should return a named
list where one of the elements entitled clustering
contains the
vector described above (no list allowed!); anything else needed by the
classifyFUN
to classify new data should be contained in the output
list as well. cluster.only
is set internally depending on whether
classifyFUN
will be later used by subsampling or only for clustering the
final product.
"..."Any additional arguments specific to the
algorithm used by clusterFUN
should be passed via ...
and NOT
passed via arguments to clusterFUN
"Other required arguments"clusterFUN
must also accept
arguments required for its algorithmType
(see Details below).
classifyFUN: The following arguments are required to be accepted for
classifyFUN
(if not NULL)
inputMatrixthe new data that will be classified into the clusters
inputTypethe inputType of the new data (see above)
clusterResultthe result of running clusterFUN
on the
training data, when cluster.only=FALSE
. Whatever is returned by
clusterFUN
is assumed to be sufficient for this function to classify
new objects (e.g. could return the centroids of the clustering, if
clustering based on nearest centroid).
algorithmType: Type "01" is for clustering functions that
expect as an input a dissimilarity matrix that takes on 0-1 values (e.g.
from subclustering) with 1 indicating more dissimilarity between samples.
"01" algorithm types must also have inputType
equal to
"diss"
. It is also generally expected that "01" algorithms use the
0-1 nature of the input to set criteria as to where to find clusters. "01"
functions must take as an argument alpha
between 0 and 1 to
determine the clusters, where larger values of alpha
require less
similarity between samples in the same cluster. "K" is for clustering
functions that require an argument k
(the number of clusters), but
arbitrary inputType
. On the other hand, "K" algorithms are assumed
to need a predetermined 'k' and are also assumed to cluster all samples to
a cluster. If not, the post-processing steps in
mainClustering
such as findBestK
and removeSil
may not operate correctly since they rely on silhouette distances.
Returns a logical value of TRUE if there are no problems. If there is a problem, returns a character string describing the problem encountered.
A ClusterFunction
object.
clusterFUN
a function defining the clustering function. See details for required arguments.
inputType
a character vector defining what type(s) of input
clusterFUN
takes. Must consist of values "diss","X", or "cat"
indicating the set of input values that the algorithm can handle (see details
below).
algorithmType
a character defining what type of clustering algorithm
clusterFUN
is. Must be one of either "01" or "K". clusterFUN
must take the corresponding required arguments for its type (see details
below).
classifyFUN
a function that has takes as input new data and the output
of clusterFUN
(where the output is from when
cluster.only=FALSE
) and results in cluster assignments of the new
data. Used in subsampling clustering. Note that the function should assume
that the data given to the inputMatrix
argument is not the same
samples that were input to the ClusterFunction (but does assume that it is
the same number of features/columns). If slot classifyFUN
is given
value NULL
then subsampling type can only be "InSample"
, see
subsampleClustering
.
inputClassifyType
the input type for the classification function (if
not NULL); like inputType
, must be a vector containing "diss","X",
or "cat"
outputType
the type of output given by clusterFUN
. Must either
be "vector" or "list". If "vector" then the output should be a vector of
length equal to the number of observations with integer-valued elements
identifying them to different clusters; the vector assignments should be in
the same order as the original input of the data. Samples that are not
assigned to any cluster should be given a '-1' value. If "list", then it
must be a list equal to the length of the number of clusters, and the
elements of the list contain the indices of the samples in that cluster.
Any indices not in any of the list elements are assumed to be -1. The main
advantage of "list" is that it can preserve the order of the clusters if
the clusterFUN
desires to do so. In which case the orderBy
argument of mainClustering
can preserve this ordering
(default is to order by size).
requiredArgs
Any additional required arguments for clusterFUN
(beyond those required of all clusterFUN
, described in details).
Will be used in checking that user provided necessary arguments.
checkFunctions
logical. If TRUE, the validity check of the
ClusterFunction
object will check the clusterFUN
with simple
toy data using the function internalFunctionCheck
.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #Use internalFunctionCheck to check possible function
goodFUN<-function(inputMatrix,k,cluster.only,...){
cluster::pam(x=t(inputMatrix),k=k,cluster.only=cluster.only)
}
#passes internal check
internalFunctionCheck(goodFUN,inputType=c("X","diss"),
algorithmType="K",outputType="vector")
myCF<-ClusterFunction(clusterFUN=goodFUN, inputType="X",
algorithmType="K", outputType="vector")
#doesn't work, because haven't made results return vector when cluster.only=TRUE
badFUN<-function(inputMatrix,k,cluster.only,...){cluster::pam(x=inputMatrix,k=k)}
internalFunctionCheck(badFUN,inputType=c("X","diss"),
algorithmType="K",outputType="vector")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.