pagoda.gene.clusters | R Documentation |
Determine de-novo gene clusters, their weighted PCA lambda1 values, and random matrix expectation.
pagoda.gene.clusters(varinfo, trim = 3.1/ncol(varinfo$mat),
n.clusters = 150, n.samples = 60, cor.method = "p",
n.internal.shuffles = 0, n.starts = 10, n.cores = detectCores(),
verbose = 0, plot = FALSE, show.random = FALSE, n.components = 1,
method = "ward.D", secondary.correlation = FALSE,
n.cells = ncol(varinfo$mat), old.results = NULL)
varinfo |
varinfo adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect()) |
trim |
additional Winsorization trim value to be used in determining clusters (to remove clusters that group outliers occurring in a given cell). Use higher values (5-15) if the resulting clusters group outlier patterns |
n.clusters |
number of clusters to be determined (recommended range is 100-200) |
n.samples |
number of randomly generated matrix samples to test the background distribution of lambda1 on |
cor.method |
correlation method ("pearson", "spearman") to be used as a distance measure for clustering |
n.internal.shuffles |
number of internal shuffles to perform (only if interested in set coherence, which is quite high for clusters by definition, disabled by default; set to 10-30 shuffles to estimate) |
n.starts |
number of wPCA EM algorithm starts at each iteration |
n.cores |
number of cores to use |
verbose |
verbosity level |
plot |
whether a plot showing distribution of random lambda1 values should be shown (along with the extreme value distribution fit) |
show.random |
whether the empirical random gene set values should be shown in addition to the Tracy-Widom analytical approximation |
n.components |
number of PC to calculate (can be increased if the number of clusters is small and some contain strong secondary patterns - rarely the case) |
method |
clustering method to be used in determining gene clusters |
secondary.correlation |
whether clustering should be performed on the correlation of the correlation matrix instead |
n.cells |
number of cells to use for the randomly generated cluster lambda1 model |
old.results |
optionally, pass old results just to plot the model without recalculating the stats |
a list containing the following fields:
clusters a list of genes in each cluster values
xf extreme value distribution fit for the standardized lambda1 of a randomly generated pattern
tci index of a top cluster in each random iteration
cl.goc weighted PCA info for each real gene cluster
varm standardized lambda1 values for each randomly generated matrix cluster
clvlm a linear model describing dependency of the cluster lambda1 on a Tracy-Widom lambda1 expectation
data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
clpca <- pagoda.gene.clusters(varinfo, trim=7.1/ncol(varinfo$mat), n.clusters=150, n.cores=10, plot=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.