mplnVisualizeAlluvial: Alluvial Plot of Multiple Clustering Results

View source: R/mplnVisualize.R

mplnVisualizeAlluvialR Documentation

Alluvial Plot of Multiple Clustering Results

Description

A function to visualize clustering results via alluvial plots, using the alluvial::alluvial() function. The function produces an alluvial plot provided multiple clustering results for the same group of observations. Up to four varying results could be visualized. Minimum, one clustering result for visualization is required. Maximum 10 colors (clusters) are supported. Colors are assigned based on cluster membership assigned for argument 'firstGrouping'.

Usage

mplnVisualizeAlluvial(
  nObservations = 50L,
  firstGrouping = floor(runif(50, min = 1, max = 8)),
  secondGrouping = vector(mode = "integer", length = 0),
  thirdGrouping = vector(mode = "integer", length = 0),
  fourthGrouping = vector(mode = "integer", length = 0),
  fileName = paste0("Plot_", date()),
  printPlot = TRUE,
  format = "pdf"
)

Arguments

nObservations

An integer specifying the total number of observations, N, in the dataset. Default value is 50L.

firstGrouping

An integer vector of length nObservations (N), specifying the cluster membership of observations. This must be provided. Colors will be assigned based on cluster membership provided in this vector. Default value is a vector of length 50.

secondGrouping

An integer vector of length nObservations (N), specifying the cluster membership of N observations. This could be obtained via another clustering run or from a different model selection criteria. Default value is an empty vector.

thirdGrouping

An integer vector of length nObservations (N), specifying the cluster membership of N observations. This could be obtained via another clustering run or from a different model selection criteria. Default value is an empty vector.

fourthGrouping

An integer vector of length nObservations (N), specifying the cluster membership of N observations. This could be obtained via another clustering run or from a different model selection criteria. Default value is an empty vector.

fileName

Unique character string indicating the name for the plot being generated. Default is Plot_date, where date is obtained from date().

printPlot

Logical indicating if plot(s) should be saved in local directory. Default TRUE. Options TRUE or FALSE.

format

Character string indicating the format of the image to be produced. Default 'pdf'. Options 'pdf' or 'png'.

Value

An alluvial plot is returned. The x-axis values are in the order of vectors assigned (if any) to firstGrouping, secondGrouping, thirdGrouping and fourthGrouping, respectively. Colors will be assigned based on cluster membership provided for argument, firstGrouping.

Author(s)

Anjali Silva, anjali@alumni.uoguelph.ca

References

Bojanowski, M., R. Edwards (2016). alluvial: R Package for Creating Alluvial Diagrams. R package version 0.1-2. Link

Wickham, H., R. François, L. Henry and K. Müller (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.7. Link

Examples

# Example 1
# Assign values for parameters
trueMu1 <- c(6.5, 6, 6, 6, 6, 6)
trueMu2 <- c(2, 2.5, 2, 2, 2, 2)

trueSigma1 <- diag(6) * 2
trueSigma2 <- diag(6)

# Generate simulated data for 500 x 6 dataset
simulatedCounts <- MPLNClust::mplnDataGenerator(nObservations = 500,
                                      dimensionality = 6,
                                      mixingProportions = c(0.79, 0.21),
                                      mu = rbind(trueMu1, trueMu2),
                                      sigma = rbind(trueSigma1, trueSigma2),
                                      produceImage = "No")

 # Clustering data for G = 1:2
 MPLNClustResults <- MPLNClust::mplnVariational(
                              dataset = as.matrix(simulatedCounts$dataset),
                              membership = "none",
                              gmin = 1,
                              gmax = 2,
                              initMethod = "kmeans",
                              nInitIterations = 1,
                              normalize = "Yes")

 # Visualize clustering results using alluvial plot
 # Access results using models selected via model selection criteria
 alluvialPlot <- MPLNClust::mplnVisualizeAlluvial(nObservations = nrow(simulatedCounts$dataset),
                           firstGrouping = MPLNClustResults$BICresults$BICmodelSelectedLabels,
                           secondGrouping = MPLNClustResults$ICLresults$ICLmodelSelectedLabels,
                           thirdGrouping = MPLNClustResults$AIC3results$AIC3modelSelectedLabels,
                           fourthGrouping = MPLNClustResults$AICresults$AICmodelSelectedLabels,
                           fileName = paste0('Plot_',date()),
                           printPlot = FALSE,
                           format = 'pdf')

 # Example 2
 # Perform clustering via K-means with centers = 2
 # Visualize clustering results using alluvial plot for
 # K-means and above MPLNClust results for BIC, ICL and AIC3.
 # Note, coloring is set with respect to argument
 # firstGrouping, which is assinged MPLNClust results.

 set.seed(1234)
 alluvialPlotMPLNClust <- MPLNClust::mplnVisualizeAlluvial(
                               nObservations = nrow(simulatedCounts$dataset),
                               firstGrouping = MPLNClustResults$BICresults$BICmodelSelectedLabels,
                               secondGrouping = MPLNClustResults$ICLresults$ICLmodelSelectedLabels,
                               thirdGrouping = MPLNClustResults$AIC3results$AIC3modelSelectedLabels,
                               fourthGrouping = kmeans(simulatedCounts$dataset, 2)$cluster,
                               fileName = paste0('Plot_',date()),
                               printPlot = FALSE,
                               format = 'pdf')

 # Note, coloring is now set with respect to argument firstGrouping,
 # which is assinged K-means results.
 set.seed(1234)
 alluvialPlotKmeans <- MPLNClust::mplnVisualizeAlluvial(nObservations = nrow(simulatedCounts$dataset),
                               firstGrouping = kmeans(simulatedCounts$dataset, 2)$cluster,
                               secondGrouping = MPLNClustResults$BICresults$BICmodelSelectedLabels,
                               thirdGrouping = MPLNClustResults$ICLresults$ICLmodelSelectedLabels,
                               fourthGrouping = MPLNClustResults$AIC3results$AIC3modelSelectedLabels,
                               fileName = paste0('Plot_',date()),
                               printPlot = FALSE,
                               format = 'pdf')


anjalisilva/MPLNClust documentation built on Sept. 19, 2024, 7:34 a.m.