mplnDataGenerator: Generating Data Using Mixtures of MPLN

View source: R/mplnMCMCEMClustering.R

mplnDataGeneratorR Documentation

Generating Data Using Mixtures of MPLN

Description

This function simulates data from a mixture of MPLN model.

Usage

mplnDataGenerator(
  nObservations,
  dimensionality,
  mixingProportions,
  mu,
  sigma,
  produceImage = "No",
  ImageName = "sampleName"
)

Arguments

nObservations

A positive integer indicating the number of observations for the dataset.

dimensionality

A positive integer indicating the dimensionality for the dataset.

mixingProportions

A numeric vector that length equal to the number of total components, indicating the proportion of each component. Vector content should sum to 1.

mu

A matrix of size (dimensionality x number of components), indicating the mean for each component. See example.

sigma

A matrix of size ((dimensionality * number of components) x dimensionality), indicating the covariance matrix for each component. See example.

produceImage

A character string indicating whether or not to produce an image. Options "Yes" or "No". Image will be produced as 'Pairs plot of log-transformed data.png" in the current working directory.

ImageName

A character string indicating name for image, if produceImage is set to "Yes". Default is "TwoComponents".

Value

Returns an S3 object of class mplnDataGenerator with results.

  • dataset - Simulated dataset.

  • trueMembership -A numeric vector indicating the membership of each observation.

  • probaPost - A matrix indicating the posterior probability that each observation belong to the component/cluster.

  • truenormfactors - A numeric vector indicating the true normalization factors used for adjusting the library sizes.

  • observations - Number of observations in the simulated dataset.

  • dimensionality - Dimensionality of the simulated dataset.

  • mixingProportions - A numeric vector indicating the mixing proportion of each component.

  • mu - True mean used for the simulated dataset.

  • sigma - True covariances used for the simulated dataset.

Author(s)

Anjali Silva, anjali@alumni.uoguelph.ca

References

Aitchison, J. and C. H. Ho (1989). The multivariate Poisson-log normal distribution. Biometrika 76.

Silva, A. et al. (2019). A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data. BMC Bioinformatics 20. Link

Examples

trueMu1 <- c(6.5, 6, 6, 6, 6, 6)
trueMu2 <- c(2, 2.5, 2, 2, 2, 2)

trueSigma1 <- diag(6) * 2
trueSigma2 <- diag(6)

# Generating simulated data
sampleData <- MPLNClust::mplnDataGenerator(nObservations = 100,
                                           dimensionality = 6,
                                           mixingProportions = c(0.79, 0.21),
                                           mu = rbind(trueMu1, trueMu2),
                                           sigma = rbind(trueSigma1, trueSigma2),
                                           produceImage = "No",
                                           ImageName = "TwoComponents")


anjalisilva/MPLNClust documentation built on Sept. 19, 2024, 7:34 a.m.