mvplnDataGenerator: Generating Data Using Mixtures of MVPLN

View source: R/mvplnDataGenerator.R

mvplnDataGeneratorR Documentation

Generating Data Using Mixtures of MVPLN

Description

This function simulates data from a mixture of MVPLN model. Each dataset will have 'n' random matrices or units, each matrix with dimension r x p, where 'r' is the number of occasions and 'p' is the number of responses/variables.

Usage

mvplnDataGenerator(
  nOccasions,
  nResponses,
  nUnits,
  mixingProportions,
  matrixMean,
  phi,
  omega
)

Arguments

nOccasions

A positive integer indicating the number of occassions. A matrix Y_j has size r x p, and the dataset will have 'j' such matrices with j = 1,...,n. Here, Y_j matrix is said to contain k ∈ 1,...,p responses/variables over i ∈ 1,...,r occasions.

nResponses

A positive integer indicating the number of responses/variables. A matrix Y_j has size r x p, and the dataset will have 'j' such matrices with j = 1,...,n. Here, Y_j matrix is said to contain k ∈ 1,...,p responses/variables over i ∈ 1,...,r occasions.

nUnits

A positive integer indicating the number of units. A matrix Y_j has size r x p, and the dataset will have 'j' such matrices with j = 1,...,n.

mixingProportions

A numeric vector that length equal to the number of total components, indicating the proportion of each component. Vector content should sum to 1.

matrixMean

A matrix of size r x p for each component/cluster, giving the matrix of means (M). All matrices should be combined via rbind. See example.

phi

A matrix of size r x r, which is the covariance matrix containing the variances and covariances between 'r' occasions, for each component/cluster. All matrices should be combined via rbind. See example.

omega

A matrix of size p x p, which is the covariance matrix containing the variance and covariances of 'p' responses/variables, for each component/cluster. All matrices should be combined via rbind. See example.

Value

Returns an S3 object of class mvplnDataGenerator with results.

  • dataset - Simulated dataset with 'n' matrices, each matrix with dimension r x p, where 'r' is the number of occasions and 'p' is the number of responses/variables.

  • truemembership - A numeric vector indicating the membership of each observation.

  • units - A positive integer indicating the number of units used for simulating the data.

  • occassions - A positive integer indicating the number of occassions used for simulating the data.

  • variables - A positive integer indicating the number of responses/variables used for simulating the data.

  • mixingProportions - A numeric vector indicating the mixing proportion of each component.

  • means - Matrix of mean used for simulating the data.

  • phi - Covariance matrix containing the variances and covariances between 'r' occasions used for simulating the data.

  • psi - Covariance matrix containing the variance and covariances of 'p' responses/variables used for simulating the data.

Author(s)

Anjali Silva, anjali@alumni.uoguelph.ca, Sanjeena Dang, sanjeenadang@cunet.carleton.ca.

References

Silva, A. et al. (2018). Finite Mixtures of Matrix Variate Poisson-Log Normal Distributions for Three-Way Count Data. arXiv preprint arXiv:1807.08380.

Aitchison, J. and C. H. Ho (1989). The multivariate Poisson-log normal distribution. Biometrika 76. Link.

Silva, A. et al. (2019). A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data. BMC Bioinformatics 20. Link.

Examples

# Example 1
# Generating simulated matrix variate count data
set.seed(1234)
trueG <- 2 # number of total G
truer <- 2 # number of total occasions
truep <- 3 # number of total responses
trueN <- 100 # number of total units
truePiG <- c(0.79, 0.21) # mixing proportions

# Mu is a r x p matrix
trueM1 <- matrix(rep(6, (truer * truep)),
                 ncol = truep,
                 nrow = truer, byrow = TRUE)

trueM2 <- matrix(rep(1, (truer * truep)),
                 ncol = truep,
                 nrow = truer,
                 byrow = TRUE)

trueMall <- rbind(trueM1, trueM2)

# Phi is a r x r matrix
# Loading needed packages for generating data
# if (!require(clusterGeneration)) install.packages("clusterGeneration")
# library("clusterGeneration")

# Covariance matrix containing variances and covariances between r occasions
# truePhi1 <- clusterGeneration::genPositiveDefMat("unifcorrmat",
#                                                   dim = truer,
#                                                   rangeVar = c(1, 1.7))$Sigma
truePhi1 <- matrix(c(1.075551, -0.488301,
                   -0.488301, 1.362777), nrow = 2)
truePhi1[1, 1] <- 1 # For identifiability issues

# truePhi2 <- clusterGeneration::genPositiveDefMat("unifcorrmat",
#                                                   dim = truer,
#                                                   rangeVar = c(0.7, 0.7))$Sigma
truePhi2 <- matrix(c(0.7000000, 0.6585887,
                     0.6585887, 0.7000000), nrow = 2)
truePhi2[1, 1] <- 1 # For identifiability issues
truePhiall <- rbind(truePhi1, truePhi2)

# Omega is a p x p matrix
# Covariance matrix containing variances and covariances between p responses
# trueOmega1 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = truep,
#                                    rangeVar = c(1, 1.7))$Sigma
trueOmega1 <- matrix(c(1.0526554, 1.0841910, -0.7976842,
                       1.0841910,  1.1518811, -0.8068102,
                       -0.7976842, -0.8068102,  1.4090578),
                       nrow = 3)
# trueOmega2 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = truep,
#                                    rangeVar = c(0.7, 0.7))$Sigma
trueOmega2 <- matrix(c(0.7000000, 0.5513744, 0.4441598,
                       0.5513744, 0.7000000, 0.4726577,
                       0.4441598, 0.4726577, 0.7000000),
                       nrow = 3)
trueOmegaAll <- rbind(trueOmega1, trueOmega2)

# Generated simulated data
sampleData <- mixMVPLN::mvplnDataGenerator(nOccasions = truer,
                                           nResponses = truep,
                                           nUnits = trueN,
                                           mixingProportions = truePiG,
                                           matrixMean = trueMall,
                                           phi = truePhiall,
                                           omega = trueOmegaAll)

# Example 2
trueG <- 1 # number of total G
truer <- 2 # number of total occasions
truep <- 3 # number of total responses
trueN <- 1000 # number of total units
truePiG <- 1L # mixing proportion for G = 1

# Mu is a r x p matrix
trueM1 <- matrix(c(6, 5.5, 6, 6, 5.5, 6),
                 ncol = truep,
                 nrow = truer,
                 byrow = TRUE)
trueMall <- rbind(trueM1)
# Phi is a r x r matrix
set.seed(1)
# truePhi1 <- clusterGeneration::genPositiveDefMat(
#                               "unifcorrmat",
#                                dim = truer,
#                               rangeVar = c(0.7, 1.7))$Sigma
truePhi1 <- matrix(c(1.3092747, 0.3219674,
                     0.3219674, 1.3233794), nrow = 2)
truePhi1[1, 1] <- 1 # for identifiability issues
truePhiall <- rbind(truePhi1)

# Omega is a p x p matrix
set.seed(1)
# trueOmega1 <- genPositiveDefMat(
#                    "unifcorrmat",
#                     dim = truep,
#                     rangeVar = c(1, 1.7))$Sigma
trueOmega1 <- matrix(c(1.1625581, 0.9157741, 0.8203499,
                       0.9157741, 1.2216287, 0.7108193,
                       0.8203499, 0.7108193, 1.2118854), nrow = 3)
trueOmegaAll <- rbind(trueOmega1)

# Generated simulated data
set.seed(1)
sampleData2 <- mixMVPLN::mvplnDataGenerator(
                         nOccasions = truer,
                         nResponses = truep,
                         nUnits = trueN,
                         mixingProportions = truePiG,
                         matrixMean = trueMall,
                         phi = truePhiall,
                         omega = trueOmegaAll)


anjalisilva/mixMVPLN documentation built on Sept. 24, 2024, 11:05 p.m.