ICLFunction: Model Selection Via Integrated Completed Likelihood

View source: R/mplnMCMCEMClustering.R

ICLFunctionR Documentation

Model Selection Via Integrated Completed Likelihood

Description

Performs model selection using integrated completed likelihood (ICL) by Biernacki et al., (2000).

Usage

ICLFunction(
  logLikelihood,
  nParameters,
  nObservations,
  clusterRunOutput = NA,
  probaPost = NA,
  gmax,
  gmin,
  parallel = FALSE
)

Arguments

logLikelihood

A vector with value of final log-likelihoods for each cluster size.

nParameters

A vector with number of parameters for each cluster size.

nObservations

A positive integer specifying the number of observations in the dataset analyzed.

clusterRunOutput

Output from MPLNClust::mplnVariational, MPLNClust::mplnMCMCParallel, or MPLNClust::mplnMCMCNonParallel functions. Either clusterRunOutput or probaPost must be provided.

probaPost

A list that is length (gmax - gmin + 1) containing posterior probability at each g, for g = gmin:gmax. This argument is useful if clustering output have been generated non-serially, e.g., g = 1:5 and g = 6:10 rather than g = 1:10. Either clusterRunOutput or probaPost must be provided.

gmax

A positive integer, > gmin, specifying the maximum number of components to be considered in the clustering run.

gmin

A positive integer specifying the minimum number of components to be considered in the clustering run.

parallel

TRUE or FALSE indicating if MPLNClust::mplnMCMCParallel has been used.

Value

Returns an S3 object of class MPLN with results.

  • allICLvalues - A vector of ICL values for each cluster size.

  • ICLmodelselected - An integer specifying model selected by ICL.

  • ICLmodelSelectedLabels - A vector of integers specifying cluster labels for the model selected. Only provided if user input clusterRunOutput.

  • ICLMessage - A character vector indicating if spurious clusters are detected. Otherwise, NA.

Author(s)

Anjali Silva, anjali@alumni.uoguelph.ca

References

Biernacki, C., G. Celeux, and G. Govaert (2000). Assessing a mixture model for clustering with the integrated classification likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22.

Examples

trueMu1 <- c(6.5, 6, 6, 6, 6, 6)
trueMu2 <- c(2, 2.5, 2, 2, 2, 2)

trueSigma1 <- diag(6) * 2
trueSigma2 <- diag(6)

# Generating simluated data
sampleData <- MPLNClust::mplnDataGenerator(nObservations = 100,
                                           dimensionality = 6,
                                           mixingProportions = c(0.79, 0.21),
                                           mu = rbind(trueMu1, trueMu2),
                                           sigma = rbind(trueSigma1, trueSigma2),
                                           produceImage = "No")

# Clustering
mplnResults <- MPLNClust::mplnVariational(dataset = sampleData$dataset,
                                          membership = sampleData$trueMembership,
                                          gmin = 1,
                                          gmax = 2,
                                          initMethod = "kmeans",
                                          nInitIterations = 2,
                                          normalize = "Yes")

# Model selection
ICLmodel <- MPLNClust::ICLFunction(logLikelihood = mplnResults$logLikelihood,
                                   nParameters = mplnResults$numbParameters,
                                   nObservations = nrow(mplnResults$dataset),
                                   clusterRunOutput = mplnResults$allResults,
                                   gmin = mplnResults$gmin,
                                   gmax = mplnResults$gmax,
                                   parallel = FALSE)


anjalisilva/MPLNClust documentation built on Sept. 19, 2024, 7:34 a.m.