pagoda.gene.clusters: Determine de-novo gene clusters and associated overdispersion...
In hms-dbmi/scde: Single Cell Differential Expression

pagoda.gene.clusters

R Documentation

Determine de-novo gene clusters and associated overdispersion info

Description

Determine de-novo gene clusters, their weighted PCA lambda1 values, and random matrix expectation.

Usage

pagoda.gene.clusters(varinfo, trim = 3.1/ncol(varinfo$mat),
  n.clusters = 150, n.samples = 60, cor.method = "p",
  n.internal.shuffles = 0, n.starts = 10, n.cores = detectCores(),
  verbose = 0, plot = FALSE, show.random = FALSE, n.components = 1,
  method = "ward.D", secondary.correlation = FALSE,
  n.cells = ncol(varinfo$mat), old.results = NULL)

Arguments

`varinfo`	varinfo adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect())
`trim`	additional Winsorization trim value to be used in determining clusters (to remove clusters that group outliers occurring in a given cell). Use higher values (5-15) if the resulting clusters group outlier patterns
`n.clusters`	number of clusters to be determined (recommended range is 100-200)
`n.samples`	number of randomly generated matrix samples to test the background distribution of lambda1 on
`cor.method`	correlation method ("pearson", "spearman") to be used as a distance measure for clustering
`n.internal.shuffles`	number of internal shuffles to perform (only if interested in set coherence, which is quite high for clusters by definition, disabled by default; set to 10-30 shuffles to estimate)
`n.starts`	number of wPCA EM algorithm starts at each iteration
`n.cores`	number of cores to use
`verbose`	verbosity level
`plot`	whether a plot showing distribution of random lambda1 values should be shown (along with the extreme value distribution fit)
`show.random`	whether the empirical random gene set values should be shown in addition to the Tracy-Widom analytical approximation
`n.components`	number of PC to calculate (can be increased if the number of clusters is small and some contain strong secondary patterns - rarely the case)
`method`	clustering method to be used in determining gene clusters
`secondary.correlation`	whether clustering should be performed on the correlation of the correlation matrix instead
`n.cells`	number of cells to use for the randomly generated cluster lambda1 model
`old.results`	optionally, pass old results just to plot the model without recalculating the stats

Value

a list containing the following fields:

clusters a list of genes in each cluster values
xf extreme value distribution fit for the standardized lambda1 of a randomly generated pattern
tci index of a top cluster in each random iteration
cl.goc weighted PCA info for each real gene cluster
varm standardized lambda1 values for each randomly generated matrix cluster
clvlm a linear model describing dependency of the cluster lambda1 on a Tracy-Widom lambda1 expectation

Examples

data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
clpca <- pagoda.gene.clusters(varinfo, trim=7.1/ncol(varinfo$mat), n.clusters=150, n.cores=10, plot=FALSE)

hms-dbmi/scde documentation built on April 19, 2023, 10:21 p.m.