``` {r options, include=F, cache=F, results='hide', message=F}
knitr::opts_chunk$set(fig.align="center", cache=FALSE,error=FALSE, fig.width=6,fig.height=6,autodep=TRUE, out.width="600px", out.height="600px", results="markup", echo=TRUE, eval=TRUE)
options(getClass.msg=FALSE)
set.seed(6473) ## for reproducibility
# Introduction To accommodate datasets that exceed available memory and CPU resources, VISION creates pooled cells using a custom micropooling algorithm. In standard use cases, the users provide the entire expression matrix and VISION will create pooled cells with the user-defined `cellsPerPartition` argument as supplied to the VISION object constructor. Analysis will resume on the pooled data matrix, where each pooled cell has, on average, `cellsPerPartition` cells per pool. This significantly reduces the run time, producing results from 500K cells in just under around an hour. We also find that the local autocorrelation scores run in the latent space as computed from the pooled cells is consistent with that when these scores are computed from the entire dataset. # The Algorithm The micropooling algorithm relies on iterative clustering of the latent space. Specifically, given an expression matrix and a `cellsPerPartition` target size, we use the following procedure: - Form a latent space using the first 30 components from PCA - Alternately, use the user-provided latent space or latent trajectory model - Compute a KNN graph, where edge weights are determined by applying a Gaussian kernel to the euclidean distances in the latent model - Perform Louvain Clustering on KNN graph - Iteratively perform K-means clustering within the louvain clusters until we have about `cellsPerPartition` cells per cluster. - Collapse cells into pooled cells by computing the gene-wise mean for all cells in each cluster. By default, VISION does not allow micropooling within discrete classes (e.g. cell types, or batch) as we find that unbiased clusterings are the most sensible and reflective of the data. Instead, in the output report, pooled cells report their percent consistency of each discrete meta data item (e.g. a pooled cell is 80% CD4+ T cell and 20% CD8+ T cell). In this way, users may develop an intuition of the phenotype of each pooled cell. To note, users may also elect to perform clusterings outside of the VISION analysis pipeline and provide it to the VISION object constructor as pooled data - this allows users to perform micorclustering within discrete classes if they so wish. # Micropooling Example ## Data In this vignette, we'll be analyzing a set of ~5,000 cells during haematopoiesis ([Tusi et al, Nature 2018](https://www.nature.com/articles/nature25741)). ## Workflow Here, we'll demonstrate a common use case of micropooling on the data: ```r library(VISION) counts = as.matrix(read.table("data/hemato_counts.csv.gz", sep=',', header=T, row.names=1)) # compute scaled counts scale.factor = median(colSums(counts)) scaled.counts = t(t(counts) / colSums(counts)) * scale.factor # read in meta data meta = read.table("data/hemato_covariates.txt.gz", sep='\t', header=T, row.names=1) meta = meta[colnames(scaled.counts), -1] vis <- Vision(scaled.counts, meta = meta signatures = c("data/h.all.v5.2.symbols.gmt"), pool = T, cellsPerPartition = 5) vis <- analyze(vis) viewResults(vis)
There may be reasons it is desired micropool outside of the main VISION analyze()
function. For example, there may be discrete classes that users may not wish to pool across (e.g. not wanting to merge disease and control cells). Or, you may want to only use the micropooling function independently of the rest of the pipeline. This can be accomplished using the applyMicroClustering()
utility function as demonstrated here.
## Create scaled.counts and meta as shown above pools <- applyMicroClustering(scaled.counts, cellsPerPartition = 5) # Create pooled versions of expression matrix and meta-data pooledExpression <- poolMatrixCols(scaled.counts, pools) pooledMeta <- poolMetaData(meta, pools) # These could then be fed into the VISION constructor vis <- Vision(pooledExpression, meta = pooledMeta signatures = c("data/h.all.v5.2.symbols.gmt"))
Alternately, if you already have a latent space computed and wish to use this (instead of PCA) when pooling outside of the normal
Vision pipeline, you can provide it when calling applyMicroClustering()
and later pool the latent space.
This is demonstrated below:
## Create scaled.counts and meta as shown above pools <- applyMicroClustering(scaled.counts, cellsPerPartition = 5, latentSpace = latentPpace) # Create pooled versions of expression matrix and meta-data pooledExpression <- poolMatrixCols(scaled.counts, pools) pooledMeta <- poolMetaData(meta, pools) pooledLatentSpace <- poolMatrixRows(latentSpace, pools) # These could then be fed into the VISION constructor vis <- Vision(pooledExpression, meta = pooledMeta signatures = c("data/h.all.v5.2.symbols.gmt"), latentSpace = pooledLatentSpace)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.