plotExplainedVariance: Plot the explained variance as a function of the number of...

View source: R/plotExplainedVariance.R

plotExplainedVarianceR Documentation

Plot the explained variance as a function of the number of signatures

Description

'plotExplainedVariance()' plots the explained variance of a single tumor genome's mutation patterns as a function of the number of signatures (increasing subsets of signatures) used for decomposition. For each number K of signatures, the highest variance explained by possible subsets of K signatures will be plotted (full or greedy search, see below). This can help to evaluate what minimum threshold for the explained variance can be used to decompose tumor genomes with the function decomposeTumorGenomes.

Usage

plotExplainedVariance(genome, signatures, minExplainedVariance=NULL,
minNumSignatures=2, maxNumSignatures=NULL, greedySearch=FALSE)

Arguments

genome

(Mandatory) The mutation load of a single genome in Alexandrov- of Shiraishi-format, i.e. as vector or matrix. The format must be the same as the one used for the signatures (see below).

signatures

(Mandatory) The list of signatures (vectors, data frames or matrices) which are to be evaluated. Each of the list objects represents one mutational signature. Vectors are used for Alexandrov signatures, data frames or matrices for Shiraishi signatures.

minExplainedVariance

(Optional) If a numeric value between 0 and 1 is specified, the plot highlights the smallest subset of signatures which is sufficient to explain at least the specified fraction of the variance of the genome's mutation patterns. If, for example, minExplainedVariance is 0.99 the smallest subset of signatures that explains at least 99% of the variance will be highlighted.

minNumSignatures

(Optional) The plot will be generated only for K>=minNumSignatures.

maxNumSignatures

(Optional) The plot will be generated only for K<=maxNumSignatures.

greedySearch

(Optional) If greedySearch is set to TRUE then not all possible combinations of minNumSignatures to maxNumSignatures signatures will be checked. Instead, first all possible combinations for exactly minNumSignatures will be checked to select the best starting set, then iteratively the next best signature will be added (maximum increase in explained variability) until maxNumSignatures is reached). NOTE: while this is only an approximation, it is highly recommended for large sets of signatures (>15)!

Value

Returns (or draws) a plot of the explained variance as a function of the number of signatures.

Author(s)

Rosario M. Piro
Politecnico di Milano
Maintainer: Rosario M. Piro
E-Mail: <rmpiro@gmail.com> or <rosariomichael.piro@polimi.it>

References

http://rmpiro.net/decompTumor2Sig/
Krueger, Piro (2019) decompTumor2Sig: Identification of mutational signatures active in individual tumors. BMC Bioinformatics 20(Suppl 4):152.

See Also

decompTumor2Sig
decomposeTumorGenomes
computeExplainedVariance

Examples


### get 15 pre-processed Shiraishi signatures computed (object 'signatures') 
### from 435 tumor genomes Alexandrov et al (PMID: 23945592)
### using the pmsignature package
sfile <- system.file("extdata",
         "Alexandrov_PMID_23945592_435_tumors-pmsignature-15sig.Rdata", 
         package="decompTumor2Sig")
load(sfile)

### load preprocessed breast cancer genomes (object 'genomes') from
### Nik-Zainal et al (PMID: 22608084) 
gfile <- system.file("extdata",
         "Nik-Zainal_PMID_22608084-genomes-Shiraishi_5bases_trDir.Rdata", 
         package="decompTumor2Sig")
load(gfile)

### plot the explained variance for 2 to 6 signatures of the first genome
plotExplainedVariance(genomes[[1]], signatures,
         minExplainedVariance=0.98, minNumSignatures=2, maxNumSignatures=6)


rmpiro/decompTumor2Sig documentation built on May 15, 2022, 3:27 a.m.