Description Usage Arguments Details Value Note Author(s) References See Also Examples
Using EM, trains several models using different initial values to escape from local optima. The best one in terms of the likelihood can be later chosen by choose.best() function.
1 2 3 4 5 |
Dt |
A matrix which contains the counts of the alternative allele where rows correspond to the genomic loci, and columns correspond to the samples. |
Dc |
A matrix which contains the counts of the total number of mapped reads where rows correspond to the genomic loci, and columns correspond to the samples. |
DcDtFile |
A file from which the data can optionally be loaded. It should contain the matrices Dc and Dt. |
C |
The assumed number of clones. |
doParal |
Boolean where TRUE means, in Linux, models with different initialization are trained in parallel on a cluster using qsub. |
outPrefix |
A prefix for the path to save the results. |
binomTryNum |
The number of models trained using different initialization. |
maxIt |
The maximum number of EM iterations. |
llCutoff |
EM iterations stops if the relative improvement in the log-likelihood is not more than this threshold. |
jobNamePrefix |
If run in parallel, this prefix will be used to name the jobs on the cluster. |
qstatWait |
The waiting time between qstat commands to assess the number of running and waiting jobs. |
fitBinomJobFile |
If run in parallel, this is the script which loads data, trains a model using a random initialization, and saves the results. |
jobShare |
If run in parallel, the job_share option of qsub determines the priority of jobs over other submitted jobs. |
ignoredSample |
A vector of indices of samples which will be ignored in training. Used by experts only to measure the stability of the results. |
fliProb |
A "flipping probability" used for noise injection which can be
disabled when |
conservative |
Boolean where TRUE means noise will be injected only if likelihood is improved after an EM iteration, otherwise the original Mu matrix will be used for the next iteration. For expert use only. |
doTalk |
If TRUE, information on the EM optimization iterations is reported. |
The likelihood of the model, given the hidden variables and the
parameters, can be computed based on a combination of binomial
distributions. In each EM iteration, the likelihood is
increased, however, due to presence of local optima, several
models should be tried using different random
initialization. For higher number of assumed clones, C
,
the parameter binomTryNum should be increased because the
dimension of the search space grows linearly with C
.
Returns a list containing the entry called models
,
which is a list of the length equal to binomTryNum where each element is
a trained model.
For each trained model, Mu
models the matrix of genotypes, where
rows and columns correspond to genomic loci and clones,
accordingly. Also, P
is the matrix of clonal frequency where rows
and columns correspond to clones and samples, accordingly.
The first column of P
corresponds to the normal clone.
The history of Mu
, P
, and the log-likelihood over
iterations is saved in lists Ps
, Mus
, and
Likelihoods
, accordingly.
The parallel mode works only in Linux, and when qsub and qstat commands are available on a cluster.
Habil Zare
Inferring clonal composition from multiple sections of a breast cancer, Zare et al., Submitted.
Clomial
,
choose.best
, Clomial.iterate
,
compute.bic
, breastCancer
1 2 3 4 5 6 7 8 9 10 11 | set.seed(1)
data(breastCancer)
Dc <- breastCancer$Dc
Dt <- breastCancer$Dt
ClomialResult <-Clomial(Dc=Dc,Dt=Dt,maxIt=20,C=4,binomTryNum=2)
chosen <- choose.best(models=ClomialResult$models)
M1 <- chosen$bestModel
print("Genotypes:")
round(M1$Mu)
print("Clone frequencies:")
M1$P
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.