tagm-mcmc: Localisation of proteins using the TAGM MCMC method

tagmMcmcTrainR Documentation

Localisation of proteins using the TAGM MCMC method

Description

These functions implement the T augmented Gaussian mixture (TAGM) model for mass spectrometry-based spatial proteomics datasets using Markov-chain Monte-Carlo (MCMC) for inference.

Usage

tagmMcmcTrain(
  object,
  fcol = "markers",
  method = "MCMC",
  numIter = 1000L,
  burnin = 100L,
  thin = 5L,
  mu0 = NULL,
  lambda0 = 0.01,
  nu0 = NULL,
  S0 = NULL,
  beta0 = NULL,
  u = 2,
  v = 10,
  numChains = 4L,
  BPPARAM = BiocParallel::bpparam()
)

tagmMcmcPredict(
  object,
  params,
  fcol = "markers",
  probJoint = FALSE,
  probOutlier = TRUE
)

tagmPredict(
  object,
  params,
  fcol = "markers",
  probJoint = FALSE,
  probOutlier = TRUE
)

tagmMcmcProcess(params)

Arguments

object

An MSnbase::MSnSet containing the spatial proteomics data to be passed to tagmMcmcTrain and tagmPredict.

fcol

The feature meta-data containing marker definitions. Default is markers.

method

A charachter() describing the inference method for the TAGM algorithm. Default is "MCMC".

numIter

The number of iterations of the MCMC algorithm. Default is 1000.

burnin

The number of samples to be discarded from the begining of the chain. Default is 100.

thin

The thinning frequency to be applied to the MCMC chain. Default is 5.

mu0

The prior mean. Default is colMeans of the expression data.

lambda0

The prior shrinkage. Default is 0.01.

nu0

The prior degreed of freedom. Default is ncol(exprs(object)) + 2

S0

The prior inverse-wishart scale matrix. Empirical prior used by default.

beta0

The prior Dirichlet distribution concentration. Default is 1 for each class.

u

The prior shape parameter for Beta(u, v). Default is 2

v

The prior shape parameter for Beta(u, v). Default is 10.

numChains

The number of parrallel chains to be run. Default it 4.

BPPARAM

Support for parallel processing using the BiocParallel infrastructure. When missing (default), the default registered BiocParallelParam parameters are used. Alternatively, one can pass a valid BiocParallelParam parameter instance: SnowParam, MulticoreParam, DoparParam, ... see the BiocParallel package for details.

params

An instance of class MCMCParams, as generated by tagmMcmcTrain().

probJoint

A logical(1) indicating whether to return the joint probability matrix, i.e. the probability for all classes as a new tagm.mcmc.joint feature variable.

probOutlier

A logical(1) indicating whether to return the probability of being an outlier as a new tagm.mcmc.outlier feature variable. A high value indicates that the protein is unlikely to belong to any annotated class (and is hence considered an outlier).

Details

The tagmMcmcTrain function generates the samples from the posterior distributions (object or class MCMCParams) based on an annotated quantitative spatial proteomics dataset (object of class MSnbase::MSnSet). Both are then passed to the tagmPredict function to predict the sub-cellular localisation of protein of unknown localisation. See the pRoloc-bayesian vignette for details and examples. In this implementation, if numerical instability is detected in the covariance matrix of the data a small multiple of the identity is added. A message is printed if this conditioning step is performed.

Value

tagmMcmcTrain returns an instance of class MCMCParams.

tagmMcmcPredict returns an instance of class MSnbase::MSnSet containing the localisation predictions as a new tagm.mcmc.allocation feature variable. The allocation probability is encoded as tagm.mcmc.probability (corresponding to the mean of the distribution probability). In additionm the upper and lower quantiles of the allocation probability distribution are available as tagm.mcmc.probability.lowerquantile and tagm.mcmc.probability.upperquantile feature variables. The Shannon entropy is available in the tagm.mcmc.mean.shannon feature variable, measuring the uncertainty in the allocations (a high value representing high uncertainty; the highest value is the natural logarithm of the number of classes).

tagmMcmcProcess returns an instance of class MCMCParams with its summary slot populated.

References

A Bayesian Mixture Modelling Approach For Spatial Proteomics Oliver M Crook, Claire M Mulvey, Paul D. W. Kirk, Kathryn S Lilley, Laurent Gatto bioRxiv 282269; doi: https://doi.org/10.1101/282269

See Also

The plotEllipse() function can be used to visualise TAGM models on PCA plots with ellipses.


lgatto/pRoloc documentation built on Oct. 23, 2024, 12:51 a.m.