assign.wrapper: ASSIGN All-in-one function

View source: R/assign.wrapper.R

assign.wrapperR Documentation

ASSIGN All-in-one function

Description

The assign.wrapper function integrates the assign.preprocess, assign.mcmc, assign.summary, assign.output, assign.cv.output functions into one wrapper function.

Usage

assign.wrapper(
  trainingData = NULL,
  testData,
  trainingLabel,
  testLabel = NULL,
  geneList = NULL,
  anchorGenes = NULL,
  excludeGenes = NULL,
  n_sigGene = NA,
  adaptive_B = TRUE,
  adaptive_S = FALSE,
  mixture_beta = TRUE,
  outputDir,
  p_beta = 0.01,
  theta0 = 0.05,
  theta1 = 0.9,
  iter = 2000,
  burn_in = 1000,
  sigma_sZero = 0.01,
  sigma_sNonZero = 1,
  S_zeroPrior = FALSE,
  pctUp = 0.5,
  geneselect_iter = 500,
  geneselect_burn_in = 100,
  outputSignature_convergence = FALSE,
  ECM = FALSE,
  progress_bar = TRUE,
  override_S_matrix = NULL
)

Arguments

trainingData

The genomic measure matrix of training samples (i.g., gene expression matrix). The dimension of this matrix is probe number x sample number. The default is NULL.

testData

The genomic measure matrix of test samples (i.g., gene expression matrix). The dimension of this matrix is probe number x sample number.

trainingLabel

The list linking the index of each training sample to a specific group it belongs to. See examples for more information.

testLabel

The vector of the phenotypes/labels of the test samples. The default is NULL.

geneList

The list that collects the signature genes of one/multiple pathways. Every component of this list contains the signature genes associated with one pathway. The default is NULL.

anchorGenes

A list of genes that will be included in the signature even if they are not chosen during gene selection.

excludeGenes

A list of genes that will be excluded from the signature even if they are chosen during gene selection.

n_sigGene

The vector of the signature genes to be identified for one pathway. n_sigGene needs to be specified when geneList is set NULL. The default is NA. See examples for more information.

adaptive_B

Logicals. If TRUE, the model adapts the baseline/background (B) of genomic measures for the test samples. The default is TRUE.

adaptive_S

Logicals. If TRUE, the model adapts the signatures (S) of genomic measures for the test samples. The default is FALSE.

mixture_beta

Logicals. If TRUE, elements of the pathway activation matrix are modeled by a spike-and-slab mixture distribution. The default is TRUE.

outputDir

The path to the directory to save the output files. The path needs to be quoted in double quotation marks.

p_beta

p_beta is the prior probability of a pathway being activated in individual test samples. The default is 0.01.

theta0

The prior probability for a gene to be significant, given that the gene is NOT defined as "significant" in the signature gene lists provided by the user. The default is 0.05.

theta1

The prior probability for a gene to be significant, given that the gene is defined as "significant" in the signature gene lists provided by the user. The default is 0.9.

iter

The number of iterations in the MCMC. The default is 2000.

burn_in

The number of burn-in iterations. These iterations are discarded when computing the posterior means of the model parameters. The default is 1000.

sigma_sZero

Each element of the signature matrix (S) is modeled by a spike-and-slab mixture distribution. Sigma_sZero is the variance of the spike normal distribution. The default is 0.01.

sigma_sNonZero

Each element of the signature matrix (S) is modeled by a spike-and-slab mixture distribution. Sigma_sNonZero is the variance of the slab normal distribution. The default is 1.

S_zeroPrior

Logicals. If TRUE, the prior distribution of signature follows a normal distribution with mean zero. The default is TRUE.

pctUp

By default, ASSIGN bayesian gene selection chooses the signature genes with an equal fraction of genes that increase with pathway activity and genes that decrease with pathway activity. Use the pctUp parameter to modify this fraction. Set pctUP to NULL to select the most significant genes, regardless of direction. The default is 0.5

geneselect_iter

The number of iterations for bayesian gene selection. The default is 500.

geneselect_burn_in

The number of burn-in iterations for bayesian gene selection. The default is 100

outputSignature_convergence

Create a pdf of the MCMC chain. The default is FALSE.

ECM

Logicals. If TRUE, ECM algorithm, rather than Gibbs sampling, is applied to approximate the model parameters. The default is FALSE.

progress_bar

Display a progress bar for MCMC and gene selection. Default is TRUE.

override_S_matrix

Replace the S_matrix created by assign.preprocess with the matrix provided in override_S_matrix. This can be used to indicate the expected directions of genes in a signature if training data is not provided.

Details

The assign.wrapper function is an all-in-one function which outputs the necessary results for basic users. For users who need more intermediate results for model diagnosis, it is better to run the assign.preprocess, assign.mcmc, assign.convergence, assign.summary functions separately and extract the output values from the returned list objects of those functions.

Value

The assign.wrapper returns one/multiple pathway activity for each individual training sample and test sample, scatter plots of pathway activity for each individual pathway in the training and test data, heatmap plots for gene expression signatures for each individual pathway, heatmap plots for the gene expression of the prior and posterior signatures (if adaptive_S equals TRUE) of each individual pathway in the test data

Author(s)

Ying Shen and W. Evan Johnson

Examples



data(trainingData1)
data(testData1)
data(geneList1)

trainingLabel1 <- list(control = list(bcat=1:10, e2f3=1:10, myc=1:10,
                                      ras=1:10, src=1:10),
                       bcat = 11:19, e2f3 = 20:28, myc= 29:38, ras = 39:48,
                       src = 49:55)
testLabel1 <- rep(c("subtypeA","subtypeB"), c(53,58))

assign.wrapper(trainingData=trainingData1, testData=testData1,
               trainingLabel=trainingLabel1, testLabel=testLabel1,
               geneList=geneList1, adaptive_B=TRUE, adaptive_S=FALSE,
               mixture_beta=TRUE, outputDir=tempdir, p_beta=0.01,
               theta0=0.05, theta1=0.9, iter=20, burn_in=10)


compbiomed/ASSIGN documentation built on June 28, 2023, 4 a.m.