testNhoods | R Documentation |
This will perform differential neighbourhood abundance testing after cell counting.
x |
A |
design |
A |
design.df |
A |
kinship |
(optional) An n X n |
min.mean |
A scalar used to threshold neighbourhoods on the minimum average cell counts across samples. |
model.contrasts |
A string vector that defines the contrasts used to perform
DA testing. For a specific comparison we recommend a single contrast be passed to
|
fdr.weighting |
The spatial FDR weighting scheme to use. Choice from max,
neighbour-distance, graph-overlap or k-distance (default). If |
robust |
If robust=TRUE then this is passed to edgeR and limma which use a robust
estimation for the global quasilikelihood dispersion distribution. See |
norm.method |
A character scalar, either |
cell.sizes |
A named numeric vector of cell numbers per experimental samples. Names should correspond
to the columns of |
reduced.dim |
A character scalar referring to the reduced dimensional slot used to compute distances for the spatial FDR. This should be the same as used for graph building. |
REML |
A logical scalar that controls the variance component behaviour to use either restricted maximum likelihood (REML) or maximum likelihood (ML). The former is recommened to account for the bias in the ML variance estimates. |
glmm.solver |
A character scalar that determines which GLMM solver is applied. Must be one of: Fisher, HE or HE-NNLS. HE or HE-NNLS are recommended when supplying a user-defined covariance matrix. |
max.iters |
A scalar that determines the maximum number of iterations to run the GLMM solver if it does not reach the convergence tolerance threshold. |
max.tol |
A scalar that deterimines the GLMM solver convergence tolerance. It is recommended to keep this number small to provide some confidence that the parameter estimates are at least in a feasible region and close to a local optimum |
subset.nhoods |
A character, numeric or logical vector that will subset the analysis to the specific nhoods. If
a character vector these should correspond to row names of |
intercept.type |
A character scalar, either fixed or random that sets the type of the global
intercept variable in the model. This only applies to the GLMM case where additional random effects variables are
already included. Setting |
fail.on.error |
A logical scalar the determines the behaviour of the error reporting. Used for debugging only. |
BPPARAM |
A BiocParallelParam object specifying the arguments for parallelisation. By default
this will evaluate using |
force |
A logical scalar that overrides the default behaviour to nicely error when N < 50 and using a mixed effect model. This is because model parameter estimation may be unstable with these sample sizes, and hence the fixed effect GLM is recommended instead. If used with the LMM, a warning will be produced. |
This function wraps up several steps of differential abundance testing using
the edgeR
functions. These could be performed separately for users
who want to exercise more contol over their DA testing. By default this
function sets the lib.sizes
to the colSums(x), and uses the
Quasi-Likelihood F-test in glmQLFTest
for DA testing. FDR correction
is performed separately as the default multiple-testing correction is
inappropriate for neighbourhoods with overlapping cells.
The GLMM testing cannot be performed using edgeR
, however, a separate
function fitGLMM
can be used to fit a mixed effect model to each
nhood (see fitGLMM
docs for details).
Parallelisation is currently only enabled for the NB-GLMM and uses the BiocParallel paradigm at
the level of R, and OpenMP to allow multi-threading of RCpp code. In general the GLM implementation
in glmQLFit
is sufficiently fast that it does not require
parallelisation. Parallelisation requires the user to pass a BiocParallelParam object
with the parallelisation arguments contained therein. This relies on the user specifying how to
parallelise - for details see the BiocParallel
package.
model.contrasts
are used to define specific comparisons for DA testing. Currently,
testNhoods
will take the last formula variable for comparisons, however, contrasts
need this to be the first variable. A future update will harmonise these behaviours for
consistency. While it is strictly feasible to compute multiple contrasts at once, the
recommendation, for ease of interpretability, is to compute one at a time.
If using the GLMM option, i.e. including a random effect variable in the design
formula, then testNhoods
will check for the sample size of the analysis. If this is
less than 60 it will stop and produce an error. It is strongly recommended that
the GLMM is not used with relatively small sample sizes, i.e. N<60, and even up to N~100
may have unstable parameter estimates across nhoods. This behaviour can be overriden by
setting force=TRUE
, but also be aware that parameter estimates may not be
accurate. A warning will be produced to alert you to this fact.
A data.frame
of model results, which contain:
logFC
:Numeric, the log fold change between conditions, or for an ordered/continous variable the per-unit change in (normalized) cell counts per unit-change in experimental variable.
logCPM
:Numeric, the log counts per million (CPM), which equates to the average log normalized cell counts across all samples.
F
:Numeric, the F-test statistic from the quali-likelihood F-test
implemented in edgeR
.
PValue
:Numeric, the unadjusted p-value from the quasi-likelihood F-test.
FDR
:Numeric, the Benjamini & Hochberg false discovery weight
computed from p.adjust
.
Nhood
:Numeric, a unique identifier corresponding to the specific graph neighbourhood.
SpatialFDR
:Numeric, the weighted FDR, computed to adjust for spatial graph overlaps between neighbourhoods. For details see graphSpatialFDR.
Mike Morgan
library(SingleCellExperiment)
ux.1 <- matrix(rpois(12000, 5), ncol=400)
ux.2 <- matrix(rpois(12000, 4), ncol=400)
ux <- rbind(ux.1, ux.2)
vx <- log2(ux + 1)
pca <- prcomp(t(vx))
sce <- SingleCellExperiment(assays=list(counts=ux, logcounts=vx),
reducedDims=SimpleList(PCA=pca$x))
milo <- Milo(sce)
milo <- buildGraph(milo, k=20, d=10, transposed=TRUE)
milo <- makeNhoods(milo, k=20, d=10, prop=0.3)
milo <- calcNhoodDistance(milo, d=10)
cond <- rep("A", ncol(milo))
cond.a <- sample(1:ncol(milo), size=floor(ncol(milo)*0.25))
cond.b <- setdiff(1:ncol(milo), cond.a)
cond[cond.b] <- "B"
meta.df <- data.frame(Condition=cond, Replicate=c(rep("R1", 132), rep("R2", 132), rep("R3", 136)))
meta.df$SampID <- paste(meta.df$Condition, meta.df$Replicate, sep="_")
milo <- countCells(milo, meta.data=meta.df, samples="SampID")
test.meta <- data.frame("Condition"=c(rep("A", 3), rep("B", 3)), "Replicate"=rep(c("R1", "R2", "R3"), 2))
test.meta$Sample <- paste(test.meta$Condition, test.meta$Replicate, sep="_")
rownames(test.meta) <- test.meta$Sample
da.res <- testNhoods(milo, design=~Condition, design.df=test.meta[colnames(nhoodCounts(milo)), ], norm.method="TMM")
da.res
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.