library(knitr) opts_chunk$set(fig.align = "center", out.width = "90%", fig.width = 6, fig.height = 5.5, dev.args=list(pointsize=10), par = TRUE, # needed for setting hook collapse = TRUE, # collapse input & ouput code in chunks warning = FALSE) knit_hooks$set(par = function(before, options, envir) { if(before && options$fig.show != "none") par(family = "sans", mar=c(4.1,4.1,1.1,1.1), mgp=c(3,1,0), tcl=-0.5) }) set.seed(1) # for exact reproducibility
mixMVPLN is an R package for model-based clustering based on mixtures of matrix variate Poisson-log normal (mixMVPLN) distributions. It is applicable for clustering of three-way count data. mixMVPLN provides functions for parameter estimation via three different frameworks: one based on Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, one based on variational Gaussian approximations (VGAs), and third based on a hybrid approach that combines VGAs with MCMC-EM. Here we will explore VGA approach.
MCMC-EM-based approach is computationally intensive; hence, we also provide a computationally efficient variational approximation framework for parameter estimation. This method is employed in the function mvplnVGAclus. Variational approximations (Wainwright et al., 2008) are approximate inference techniques in which a computationally convenient approximating density is used in place of a more complex but 'true' posterior density. The approximating density is obtained by minimizing the Kullback-Leibler (KL) divergence between the true and the approximating densities. A variational Gaussian approximations (VGAs) is used for parameter estimation, initially proposed for MPLN framework by Subedi and Browne, 2020. The VGAs approach is computationally efficient, however, it does not guarantee an exact posterior (Ghahramani and Beal, 1999).
Four model selection criteria are offered, which include the Akaike information criterion (AIC; Akaike, 1973), the Bayesian information criterion (BIC; Schwarz, 1978), a variation of the AIC used by Bozdogan (1994) called AIC3, and the integrated completed likelihood (ICL; Biernacki et al., 2000). Also included is a function for simulating data from this model.
Starting values (argument: initMethod) and the number of iterations for each chain (argument: nInitIterations) play an important role to the successful operation of this algorithm. There maybe issues with singularity, in which case altering starting values or initialization method may help.
This document was written in R Markdown, using the knitr package for production. See help(package = "mixMVPLN")
for further details and references provided by citation("mixMVPLN")
. To download mixMVPLN, use the following commands:
require("devtools") devtools::install_github("anjalisilva/mixMVPLN", build_vignettes = TRUE) library("mixMVPLN")
The function mixMVPLN::mvplnDataGenerator() permits to simulate data from a mixture of MVPLN distributions. See ?mvplnDataGenerator for more information, an example, and references. To simulate a dataset from a mixture of MVPLN with 50 units, 3 total repsonses, and 2 occasions, with two mixture components, each with a mixing proportion of 0.79 and 0.21, respectively, let us use mixMVPLN::mvplnDataGenerator(). Here clusterGeneration R package is used to generate positive definite covariance matrices for illustration purposes.
set.seed(1234) # for reproducibility, setting seed trueG <- 2 # number of total components/clusters truer <- 2 # number of total occasions truep <- 3 # number of total responses truen <- 100 # number of total units # Mu is a r x p matrix trueM1 <- matrix(rep(6, (truer * truep)), ncol = truep, nrow = truer, byrow = TRUE) trueM2 <- matrix(rep(1, (truer * truep)), ncol = truep, nrow = truer, byrow = TRUE) trueMall <- rbind(trueM1, trueM2) # Phi is a r x r matrix # Loading needed packages for generating data if (!require(clusterGeneration)) install.packages("clusterGeneration") # Covariance matrix containing variances and covariances between r occasions truePhi1 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = truer, rangeVar = c(1, 1.7))$Sigma truePhi1[1, 1] <- 1 # For identifiability issues truePhi2 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = truer, rangeVar = c(0.7, 0.7))$Sigma truePhi2[1, 1] <- 1 # For identifiability issues truePhiAll <- rbind(truePhi1, truePhi2) # Omega is a p x p matrix # Covariance matrix containing variances and covariances between p responses trueOmega1 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = truep, rangeVar = c(1, 1.7))$Sigma trueOmega2 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = truep, rangeVar = c(0.7, 0.7))$Sigma trueOmegaAll <- rbind(trueOmega1, trueOmega2) # Simulating data simulatedMVData <- mixMVPLN::mvplnDataGenerator(nOccasions = truer, nResponses = truep, nUnits = truen, mixingProportions = c(0.55, 0.45), matrixMean = trueMall, phi = truePhiAll, omega = trueOmegaAll)
The generated dataset can be checked:
length(simulatedMVData$dataset) # 100 units class(simulatedMVData$dataset) # list with length of 100 typeof(simulatedMVData$dataset) # list summary(simulatedMVData$dataset) # summary of data dim(simulatedMVData$dataset[[1]]) # dimension of first unit is 2 x 3 # 2 occasions and 3 responses
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.