testNhoods: Perform differential neighbourhood abundance testing

View source: R/testNhoods.R

testNhoodsR Documentation

Perform differential neighbourhood abundance testing

Description

This will perform differential neighbourhood abundance testing after cell counting.

Arguments

x

A Milo object with a non-empty nhoodCounts slot.

design

A formula or model.matrix object describing the experimental design for differential abundance testing. The last component of the formula or last column of the model matrix are by default the test variable. This behaviour can be overridden by setting the model.contrasts argument

design.df

A data.frame containing meta-data to which design refers to

kinship

(optional) An n X n matrix containing pair-wise relationships between observations, such as expected relationships or computed from SNPs/SNVs/other genetic variants. Row names and column names should correspond to the column names of nhoods(x) and rownames of design.df.

min.mean

A scalar used to threshold neighbourhoods on the minimum average cell counts across samples.

model.contrasts

A string vector that defines the contrasts used to perform DA testing. For a specific comparison we recommend a single contrast be passed to testNhoods. More details can be found in the vignette milo_contrasts.

fdr.weighting

The spatial FDR weighting scheme to use. Choice from max, neighbour-distance, graph-overlap or k-distance (default). If none is passed no spatial FDR correction is performed and returns a vector of NAs.

robust

If robust=TRUE then this is passed to edgeR and limma which use a robust estimation for the global quasilikelihood dispersion distribution. See edgeR and Phipson et al, 2013 for details.

norm.method

A character scalar, either "logMS", "TMM" or "RLE". The "logMS" method normalises the counts across samples using the log columns sums of the count matrix as a model offset. "TMM" uses the trimmed mean of M-values normalisation as described in Robinson & Oshlack, 2010, whilst "RLE" uses the relative log expression method by Anders & Huber, 2010, to compute normalisation factors relative to a reference computed from the geometric mean across samples. The latter methods provides a degree of robustness against false positives when there are very large compositional differences between samples.

cell.sizes

A named numeric vector of cell numbers per experimental samples. Names should correspond to the columns of nhoodCounts. This can be used to define the model normalisation factors based on a set of numbers instead of the colSums(nhoodCounts(x)). The example use-case is when performing an analysis of a subset of nhoods while retaining the need to normalisation based on the numbers of cells collected for each experimental sample to avoid compositional biases. Infinite or NA values will give an error.

reduced.dim

A character scalar referring to the reduced dimensional slot used to compute distances for the spatial FDR. This should be the same as used for graph building.

REML

A logical scalar that controls the variance component behaviour to use either restricted maximum likelihood (REML) or maximum likelihood (ML). The former is recommened to account for the bias in the ML variance estimates.

glmm.solver

A character scalar that determines which GLMM solver is applied. Must be one of: Fisher, HE or HE-NNLS. HE or HE-NNLS are recommended when supplying a user-defined covariance matrix.

max.iters

A scalar that determines the maximum number of iterations to run the GLMM solver if it does not reach the convergence tolerance threshold.

max.tol

A scalar that deterimines the GLMM solver convergence tolerance. It is recommended to keep this number small to provide some confidence that the parameter estimates are at least in a feasible region and close to a local optimum

subset.nhoods

A character, numeric or logical vector that will subset the analysis to the specific nhoods. If a character vector these should correspond to row names of nhoodCounts. If a logical vector then these should have the same length as nrow of nhoodCounts. If numeric, then these are assumed to correspond to indices of nhoodCounts - if the maximal index is greater than nrow(nhoodCounts(x)) an error will be produced.

intercept.type

A character scalar, either fixed or random that sets the type of the global intercept variable in the model. This only applies to the GLMM case where additional random effects variables are already included. Setting intercept.type="fixed" or intercept.type="random" will require the user to test their model for failures with each. In the case of using a kinship matrix, intercept.type="fixed" is set automatically.

fail.on.error

A logical scalar the determines the behaviour of the error reporting. Used for debugging only.

BPPARAM

A BiocParallelParam object specifying the arguments for parallelisation. By default this will evaluate using SerialParam(). See detailson how to use parallelisation in testNhoods.

force

A logical scalar that overrides the default behaviour to nicely error when N < 50 and using a mixed effect model. This is because model parameter estimation may be unstable with these sample sizes, and hence the fixed effect GLM is recommended instead. If used with the LMM, a warning will be produced.

Details

This function wraps up several steps of differential abundance testing using the edgeR functions. These could be performed separately for users who want to exercise more contol over their DA testing. By default this function sets the lib.sizes to the colSums(x), and uses the Quasi-Likelihood F-test in glmQLFTest for DA testing. FDR correction is performed separately as the default multiple-testing correction is inappropriate for neighbourhoods with overlapping cells. The GLMM testing cannot be performed using edgeR, however, a separate function fitGLMM can be used to fit a mixed effect model to each nhood (see fitGLMM docs for details).

Parallelisation is currently only enabled for the NB-GLMM and uses the BiocParallel paradigm at the level of R, and OpenMP to allow multi-threading of RCpp code. In general the GLM implementation in glmQLFit is sufficiently fast that it does not require parallelisation. Parallelisation requires the user to pass a BiocParallelParam object with the parallelisation arguments contained therein. This relies on the user specifying how to parallelise - for details see the BiocParallel package.

model.contrasts are used to define specific comparisons for DA testing. Currently, testNhoods will take the last formula variable for comparisons, however, contrasts need this to be the first variable. A future update will harmonise these behaviours for consistency. While it is strictly feasible to compute multiple contrasts at once, the recommendation, for ease of interpretability, is to compute one at a time.

If using the GLMM option, i.e. including a random effect variable in the design formula, then testNhoods will check for the sample size of the analysis. If this is less than 60 it will stop and produce an error. It is strongly recommended that the GLMM is not used with relatively small sample sizes, i.e. N<60, and even up to N~100 may have unstable parameter estimates across nhoods. This behaviour can be overriden by setting force=TRUE, but also be aware that parameter estimates may not be accurate. A warning will be produced to alert you to this fact.

Value

A data.frame of model results, which contain:

logFC:

Numeric, the log fold change between conditions, or for an ordered/continous variable the per-unit change in (normalized) cell counts per unit-change in experimental variable.

logCPM:

Numeric, the log counts per million (CPM), which equates to the average log normalized cell counts across all samples.

F:

Numeric, the F-test statistic from the quali-likelihood F-test implemented in edgeR.

PValue:

Numeric, the unadjusted p-value from the quasi-likelihood F-test.

FDR:

Numeric, the Benjamini & Hochberg false discovery weight computed from p.adjust.

Nhood:

Numeric, a unique identifier corresponding to the specific graph neighbourhood.

SpatialFDR:

Numeric, the weighted FDR, computed to adjust for spatial graph overlaps between neighbourhoods. For details see graphSpatialFDR.

Author(s)

Mike Morgan

Examples

library(SingleCellExperiment)
ux.1 <- matrix(rpois(12000, 5), ncol=400)
ux.2 <- matrix(rpois(12000, 4), ncol=400)
ux <- rbind(ux.1, ux.2)
vx <- log2(ux + 1)
pca <- prcomp(t(vx))

sce <- SingleCellExperiment(assays=list(counts=ux, logcounts=vx),
                            reducedDims=SimpleList(PCA=pca$x))

milo <- Milo(sce)
milo <- buildGraph(milo, k=20, d=10, transposed=TRUE)
milo <- makeNhoods(milo, k=20, d=10, prop=0.3)
milo <- calcNhoodDistance(milo, d=10)

cond <- rep("A", ncol(milo))
cond.a <- sample(1:ncol(milo), size=floor(ncol(milo)*0.25))
cond.b <- setdiff(1:ncol(milo), cond.a)
cond[cond.b] <- "B"
meta.df <- data.frame(Condition=cond, Replicate=c(rep("R1", 132), rep("R2", 132), rep("R3", 136)))
meta.df$SampID <- paste(meta.df$Condition, meta.df$Replicate, sep="_")
milo <- countCells(milo, meta.data=meta.df, samples="SampID")

test.meta <- data.frame("Condition"=c(rep("A", 3), rep("B", 3)), "Replicate"=rep(c("R1", "R2", "R3"), 2))
test.meta$Sample <- paste(test.meta$Condition, test.meta$Replicate, sep="_")
rownames(test.meta) <- test.meta$Sample
da.res <- testNhoods(milo, design=~Condition, design.df=test.meta[colnames(nhoodCounts(milo)), ], norm.method="TMM")
da.res


MikeDMorgan/miloR documentation built on Oct. 19, 2024, 8:39 p.m.