knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width="100%", fig.width=7, fig.height=5, dpi=300, fig.path="figures/BayesSpace-", message=FALSE, warning=FALSE, error=FALSE )
library(SingleCellExperiment) library(ggplot2) library(BayesSpace)
BayesSpace supports three ways of loading a SingleCellExperiment
for analysis.
Visium datasets processed with Space
Ranger
can be loaded directly via the readVisium()
function. This function takes only
the path to the Space Ranger output directory (containing the spatial/
and
filtered_feature_bc_matrix/
subdirectories) and returns a
SingleCellExperiment
.
sce <- readVisium("path/to/spaceranger/outs/")
Second, all datasets analyzed for the BayesSpace manuscript are readily
accessible via the getRDS()
function. This function takes two arguments - the
name of the dataset, and the name of the sample in the dataset.
melanoma <- getRDS(dataset="2018_thrane_melanoma", sample="ST_mel1_rep2")
Finally, SingleCellExperiment
objects can be constructed manually from a
counts matrix and tables of row and column data. BayesSpace only requires that
spot array coordinates be provided as columns named row
and col
in
colData
. (Note that enhancement of Visium datasets additionally requires the
pixel coordinates of each spot in the tissue image, but in this case the dataset
should be loaded with readVisium()
, which loads these data automatically.)
library(Matrix) rowData <- read.csv("path/to/rowData.csv", stringsAsFactors=FALSE) colData <- read.csv("path/to/colData.csv", stringsAsFactors=FALSE, row.names=1) counts <- read.csv("path/to/counts.csv.gz", row.names=1, check.names=F, stringsAsFactors=FALSE)) sce <- SingleCellExperiment(assays=list(counts=as(counts, "dgCMatrix")), rowData=rowData, colData=colData)
We'll continue with the melanoma sample from the 2018 Spatial Transcriptomics paper for the remaining examples in this vignette.
BayesSpace requires minimal data pre-processing, but we provide a helper function to automate it.
spatialPreprocess()
log-normalizes the count matrix and performs PCA on the
top n.HVGs
highly variable genes, keeping the top n.PCs
principal
components. Additionally, the spatial sequencing platform is added as metadata
in the SingleCellExperiment
for downstream analyses. If you do not wish to
rerun PCA, running spatialPreprocess()
with the flag skip.PCA=TRUE
will only
add the metadata BayesSpace requires.
Here, we omit log-normalization as all datasets available through getRDS()
already include log-normalized counts.
set.seed(102) melanoma <- spatialPreprocess(melanoma, platform="ST", n.PCs=7, n.HVGs=2000, log.normalize=FALSE)
We can use the qTune()
and qPlot()
functions to help choose q
, the number
of clusters to use in our analysis.
qTune()
runs the BayesSpace clustering algorithm for multiple specified
values of q
(by default, 3 through 7) and computes their average
pseudo-log-likelihood. It accepts any arguments to spatialCluster()
.qPlot()
plots the pseudo-log-likelihood as a function of q
; we suggest
choosing a q
around the elbow of this plot.melanoma <- qTune(melanoma, qs=seq(2, 10), platform="ST", d=7) qPlot(melanoma)
The spatialCluster()
function clusters the spots, and adds the predicted
cluster labels to the SingleCellExperiment
. Typically, as we did for the
analyses in the paper, we suggest running with at least 10,000 iterations
(nrep=10000
), but we use 1,000 iteration in this demonstration for the sake of
runtime. (Note that a random seed must be set in order for the results to be
reproducible.)
set.seed(149) melanoma <- spatialCluster(melanoma, q=4, platform="ST", d=7, init.method="mclust", model="t", gamma=2, nrep=1000, burn.in=100, save.chain=TRUE)
Both the mclust initialization (cluster.init
) and the BayesSpace cluster
assignments (spatial.cluster
) are now available in the SingleCellExperiment's
colData
.
head(colData(melanoma))
We can plot the cluster assignments over the spatial locations of the spots with
clusterPlot()
.
clusterPlot(melanoma)
As clusterPlot()
returns a ggplot
object, it can be customized by composing
with familiar ggplot2
functions. Additionally, the argument palette
sets the
colors used for each cluster, and clusterPlot()
takes additional arguments to
geom_polygon()
such as size
or color
to control the aesthetics of the spot
borders.
clusterPlot(melanoma, palette=c("purple", "red", "blue", "yellow"), color="black") + theme_bw() + xlab("Column") + ylab("Row") + labs(fill="BayesSpace\ncluster", title="Spatial clustering of ST_mel1_rep2")
The spatialEnhance()
function will enhance the resolution of the principal
components, and add these PCs as well as predicted cluster labels at subspot
resolution to a new SingleCellExperiment
. As with our demonstration of
spatialCluster()
above, we are using fewer iterations for the purpose of this
example (nrep=1000
) than we recommend in practice (nrep=100000
or greater).
melanoma.enhanced <- spatialEnhance(melanoma, q=4, platform="ST", d=7, model="t", gamma=2, jitter_prior=0.3, jitter_scale=3.5, nrep=1000, burn.in=100, save.chain=TRUE)
The enhanced SingleCellExperiment
includes an index to the parent spot in the
original sce
(spot.idx
), along with an index to the subspot. It adds the
offsets to the original spot coordinates, and provides the enhanced cluster
label (spatial.cluster
).
head(colData(melanoma.enhanced))
We can plot the enhanced cluster assignments as above.
clusterPlot(melanoma.enhanced)
BayesSpace operates on the principal components of the gene expression matrix,
and spatialEnhance()
therefore computes enhanced resolution PC vectors.
Enhanced gene expression is not computed directly, and is instead imputed using
a regression algorithm. For each gene, a model using the PC vectors of each spot
is trained to predict the spot-level gene expression, and the fitted model is
used to predict subspot expression from the subspot PCs.
Gene expression enhancement is implemented in the enhanceFeatures()
function.
BayesSpace predicts expression with
xgboost
by default, but linear
and Dirichlet regression are also available via the model
argument. When using
xgboost
, we suggest automatically tuning the nrounds
parameter by setting it
to 0, although this comes at the cost of increased runtime (~4x slower than a
pre-specified nrounds
in practice).
enhanceFeatures()
can be used to impute subspot-level expression for all
genes, or for a subset of genes of interest. Here, we'll demonstrate by
enhancing the expression of four marker genes: PMEL (melanoma), CD2 (T-cells),
CD19 (B-cells), and COL1A1 (fibroblasts).
markers <- c("PMEL", "CD2", "CD19", "COL1A1") melanoma.enhanced <- enhanceFeatures(melanoma.enhanced, melanoma, feature_names=markers, nrounds=0)
By default, log-normalized expression (logcounts(sce)
) is imputed, although
other assays or arbitrary feature matrices can be specified.
logcounts(melanoma.enhanced)[markers, 1:5]
Diagnostic measures from each predictive model, such as rmse
when using
xgboost
, are added to the rowData
of the enhanced dataset.
rowData(melanoma.enhanced)[markers, ]
Spatial gene expression is visualized with featurePlot()
.
featurePlot(melanoma.enhanced, "PMEL")
Here, we compare the spatial expression of the imputed marker genes.
enhanced.plots <- purrr::map(markers, function(x) featurePlot(melanoma.enhanced, x)) patchwork::wrap_plots(enhanced.plots, ncol=2)
And we can compare to the spot-level expression.
spot.plots <- purrr::map(markers, function(x) featurePlot(melanoma, x)) patchwork::wrap_plots(c(enhanced.plots, spot.plots), ncol=4)
If save.chain
is set to TRUE
in either spatialCluster()
or
spatialEnhance()
, the chain associated with the respective MCMC run is
preserved to disk as an HDF5 file. The path to this file is stored in the
SingleCellExperiment's metadata at metadata(sce)$h5.chain
, and can be read
directly using mcmcChain()
.
The chain is provided as a coda::mcmc
object, which can be analyzed with
TidyBayes or as a matrix. The object has
one row per iteration, with the values of the parameters concatenated across the
row. Columns are named with the parameter name and index (if any).
chain <- mcmcChain(melanoma) chain[1:5, 1:5]
To remove the HDF5 file from disk and remove its path from the metadata, use
removeChain()
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.