knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SpatialDecon")
This vignette demonstrates the use of the SpatialDecon package to estimate cell abundance in spatial gene expression studies.
We'll analyze a small GeoMx dataset from a lung tumor, looking for abundance of different immune cell types. This dataset has 30 ROIs. In each ROI, Tumor and Microenvironment segments have been profiled separately.
First, we load the package:
library(SpatialDecon)
Now let's load our example data and examine it:
data("mini_geomx_dataset") norm = mini_geomx_dataset$normalized raw = mini_geomx_dataset$raw annot = mini_geomx_dataset$annot dim(raw) head(annot) raw[seq_len(5), seq_len(5)] # better segment names: colnames(norm) = colnames(raw) = rownames(annot) = paste0(annot$ROI, annot$AOI.name)
The spatialdecon function takes 3 arguments of expression data:
We estimate each data point's expected background from the negative control probes from its corresponding observation:
# use the NegProbe to estimate per-observation background per.observation.mean.neg = norm["NegProbe", ] # and define a background matrix in which each column (observation) is the # appropriate value of per-observation background: bg = sweep(norm * 0, 2, per.observation.mean.neg, "+") dim(bg)
A note for background estimation: in studies with two probesets, the genes from each probeset will have distinct background values, and the above code should be run separately for each probeset using its corresponding NegProbe value. Alternatively, the "derive_GeoMx_background" can do this automatically:
bg2 = derive_GeoMx_background(norm = norm, probepool = rep(1, nrow(norm)), negnames = "NegProbe")
A "cell profile matrix" is a pre-defined matrix that specifies the expected expression profiles of each cell type in the experiment. The SpatialDecon library comes with one such matrix pre-loaded, the "SafeTME" matrix, designed for estimation of immune and stroma cells in the tumor microenvironment. (This matrix was designed to avoid genes commonly expressed by cancer cells; see the SpatialDecon manuscript for details.)
Let's take a glance at the safeTME matrix:
data("safeTME") data("safeTME.matches") signif(safeTME[seq_len(3), seq_len(3)], 2) heatmap(sweep(safeTME, 1, apply(safeTME, 1, max), "/"), labRow = NA, margins = c(10, 5))
For studies of other tissue types, we have provided a library of cell profile matrices, available on Github and downloadable with the "download_profile_matrix" function.
For a complete list of matrices, see ?download_profile_matrix.
Below we download a matrix of cell profiles derived from scRNA-seq of a mouse brain.
mousebrain = download_profile_matrix(matrixname = "Mouse_Brain") dim(mousebrain) heatmap(sweep(mousebrain, 1, apply(mousebrain, 1, max), "/"), labRow = NA, margins = c(12, 5), cexCol = 0.7)
Now our data is ready for deconvolution. First we'll show how to use spatialdecon under the basic settings, omitting optional bells and whistles.
res = spatialdecon(norm = norm, bg = bg, X = safeTME, align_genes = TRUE) str(res)
We're most interested in "beta", the matrix of estimated cell abundances.
heatmap(res$beta, cexCol = 0.5, cexRow = 0.7, margins = c(10,7))
spatialdecon has several abilities beyond basic deconvolution:
Let's take a look at an example cell matching object:
str(safeTME.matches)
Now let's run spatialdecon:
# vector identifying pure tumor segments: annot$istumor = (annot$AOI.name == "Tumor") # run spatialdecon with all the bells and whistles: restils = spatialdecon(norm = norm, # normalized data raw = raw, # raw data, used to down-weight low-count observations bg = bg, # expected background counts for every data point in norm X = safeTME, # safeTME matrix, used by default cellmerges = safeTME.matches, # safeTME.matches object, used by default cell_counts = annot$nuclei, # nuclei counts, used to estimate total cells is_pure_tumor = annot$istumor, # identities of the Tumor segments/observations n_tumor_clusters = 5) # how many distinct tumor profiles to append to safeTME str(restils)
There are quite a few readouts here. Let's review the important ones:
To illustrate the derivation of tumor profiles, let's look at the cell profile matrix output by spatialdecon:
heatmap(sweep(restils$X, 1, apply(restils$X, 1, max), "/"), labRow = NA, margins = c(10, 5))
Note the new tumor-specific columns.
Finally, let's compare deconvolution results from basic vs. the advanced setting with tumor profiles appended (just for a few cell types):
par(mfrow = c(2, 3)) par(mar = c(5,7,2,1)) for (i in seq_len(6)) { cell = rownames(res$beta)[i] plot(res$beta[cell, ], restils$beta.granular[cell, ], xlab = paste0(cell, " score under basic setting"), ylab = paste0(cell, " score when tumor\ncells are modelled"), pch = 16, col = c(rgb(0,0,1,0.5), rgb(1,0,0,0.5))[1 + annot$istumor], xlim = range(c(res$beta[cell, ], restils$beta.granular[cell, ])), ylim = range(c(res$beta[cell, ], restils$beta.granular[cell, ]))) abline(0,1) if (i == 1) { legend("topleft", pch = 16, col = c(rgb(0,0,1,0.5), rgb(1,0,0,0.5)), legend = c("microenv.", "tumor")) } }
So the impact of modelling tumor is two-fold:
The SpatialDecon package contains two specialized plotting functions, and a default color palette for the safeTME matrix.
The first function is "TIL_barplot", which is just a convenient way of drawing barplots of cell type abundance.
# For reference, show the TILs color data object used by the plotting functions # when safeTME has been used: data("cellcols") cellcols # show just the TME segments, since that's where the immune cells are: layout(mat = (matrix(c(1, 2), 1)), widths = c(7, 3)) TIL_barplot(restils$cell.counts$cell.counts, draw_legend = TRUE, cex.names = 0.5) # or the proportions of cells: TIL_barplot(restils$prop_of_nontumor[, annot$AOI.name == "TME"], draw_legend = TRUE, cex.names = 0.75)
The second function is "florets", used for plotting cell abundances atop some 2-D projection. Here, we'll plot cell abundances atop the first 2 principal components of the data:
# PCA of the normalized data: pc = prcomp(t(log2(pmax(norm, 1))))$x[, c(1, 2)] # run florets function: par(mar = c(5,5,1,1)) layout(mat = (matrix(c(1, 2), 1)), widths = c(6, 2)) florets(x = pc[, 1], y = pc[, 2], b = restils$beta, cex = 2, xlab = "PC1", ylab = "PC2") par(mar = c(0,0,0,0)) frame() legend("center", fill = cellcols[rownames(restils$beta)], legend = rownames(restils$beta), cex = 0.7)
So we can see that PC1 roughly tracks many vs. few immune cells, and PC2 tracks the relative abundance of lymphoid/myeloid populations.
The SpatialDecon library includes several helpful functions for further analysis/fine-tuning of deconvolution results.
When two cell types are too similar, the estimation of their abundances becomes unstable. However, their sum can still be estimated easily. The function "collapseCellTypes" takes a deconvolution results object and collapses any colsely-related cell types you tell it to:
matching = list() matching$myeloid = c( "macrophages", "monocytes", "mDCs") matching$T.NK = c("CD4.T.cells","CD8.T.cells", "Treg", "NK") matching$B = c("B") matching$mast = c("mast") matching$neutrophils = c("neutrophils") matching$stroma = c("endothelial.cells", "fibroblasts") collapsed = collapseCellTypes(fit = restils, matching = matching) heatmap(collapsed$beta, cexRow = 0.85, cexCol = 0.75)
Sometimes a cell profile matrix will omit a cell type you know to be present in a tissue. If your data includes any regions that are purely this unmodelled cell type - e.g. because you've used the GeoMx platform's segmentation capability to specifically select them - then you can infer a profile for that cell type and merge it with your cell profile matrix. The algorithm clusters all the observations you designate as purely the unmodelled cell type, and collapses those clusters into as many profiles of that cell type as you wish. For cancer cell, it may be appropriate to specify 10 or more clusters; for highly regular healthy cells, one cluster may suffice.
(Note: this functionality can also be run within the spatialdecon function, as is demonstrated further above.)
pure.tumor.ids = annot$AOI.name == "Tumor" safeTME.with.tumor = mergeTumorIntoX(norm = norm, bg = bg, pure_tumor_ids = pure.tumor.ids, X = safeTME, K = 3) heatmap(sweep(safeTME.with.tumor, 1, apply(safeTME.with.tumor, 1, max), "/"), labRow = NA, margins = c(10, 5))
Once cell type abundance has been estimated, we can flip the deconvolution around, modelling the expression data as a function of cell abundances, and thereby deriving:
The function "reversedecon" runs this model.
rdecon = reverseDecon(norm = norm, beta = res$beta) str(rdecon) # look at the residuals: heatmap(pmax(pmin(rdecon$resids, 2), -2))
# look at the two metrics of goodness-of-fit: plot(rdecon$cors, rdecon$resid.sd, col = 0) showgenes = c("CXCL14", "LYZ", "NKG7") text(rdecon$cors[setdiff(names(rdecon$cors), showgenes)], rdecon$resid.sd[setdiff(names(rdecon$cors), showgenes)], setdiff(names(rdecon$cors), showgenes), cex = 0.5) text(rdecon$cors[showgenes], rdecon$resid.sd[showgenes], showgenes, cex = 0.75, col = 2)
From the above plot, we can see that genes like CXCL14 vary independently of cell mixing, genes like LYZ are correlated with cell mixing but still have variable expression, and genes like NKG7 serve as nothing but obtuse readouts of cell mixing.
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.