r Sys.Date()
Rapid advances in flow cytometry and other single-cell technologies have enabled high-dimensional, multi-parameter, high-throughput measurements of individual cells. Numerous questions about cell population heterogeneity can now be addressed, as these novel technologies permit single-cell analysis of antigen-specific T-cells. Unfortunately, there is a lack of computational tools to take full advantage of these complex data. COMPASS is a statistical framework that enables unbiased analysis of antigen-specific T-cell subsets. COMPASS uses a Bayesian hierarchical framework to model all observed cell-subsets and select the most likely to be antigen-specific while regularizing the small cell counts that often arise in multi-parameter space. The model provides a posterior probability of specificity for each cell subset and each sample, which can be used to profile a subject’s immune response to external stimuli such as infection or vaccination.
We will use a simulated dataset to illustrate the use of the COMPASS
package.
First, we will outline the components used to construct the COMPASSContainer
,
the data structure used to hold data from an ICS experiment. We will first
initialize some parameters used in generating the simulated data.
library(COMPASS) set.seed(123) n <- 100 ## number of samples k <- 6 ## number of markers sid_vec <- paste0("sid_", 1:n) ## sample ids; unique names used to denote samples iid_vec <- rep_len( paste0("iid_", 1:(n/10) ), n ) ## individual ids
The COMPASSContainer
is built out of three main components: the data,
the total counts, and the metadata. We will describe the R structure
of these components next.
This is a list of matrices, with each matrix representing a population of cells drawn from a particular sample. Each row is an individual cell, each column is a marker (cytokine), and each matrix element is an intensity measure associated with a particular cell and marker. Only cells that express at least one marker are included.
data <- replicate(n, { nrow <- round(runif(1) * 1E4 + 1000) ncol <- k vals <- rexp( nrow * ncol, runif(1, 1E-5, 1E-3) ) vals[ vals < 2000 ] <- 0 output <- matrix(vals, nrow, ncol) output <- output[ apply(output, 1, sum) > 0, ] colnames(output) <- paste0("M", 1:k) return(output) }) names(data) <- sid_vec head( data[[1]] )
This is a named integer vector, denoting the total number of cells that were
recovered from each sample in data
.
counts <- sapply(data, nrow) + round( rnorm(n, 1E4, 1E3) ) counts <- setNames( as.integer(counts), names(counts) ) head(counts)
This is a data.frame
that associates metadata information with each sample
available in data
/ counts
. We will suppose that each sample was
subject to one of two treatments named Control
and Treatment
.
meta <- data.frame( sid=sid_vec, iid=iid_vec, trt=sample( c("Control", "Treatment"), n, TRUE ) ) head(meta)
Once we have these components, we can construct the COMPASSContainer
:
CC <- COMPASSContainer( data=data, counts=counts, meta=meta, individual_id="iid", sample_id="sid" )
We can see some basic information about our COMPASSContainer
:
CC
summary(CC)
Fitting the COMPASS
model is very easy once the data has been inserted into
the COMPASSContainer
object. To specify the model, the user needs to specify
the criteria that identify samples that received a positive stimulation, and
samples that received a negative stimulation. In our data, we can specify it
as follows. (Because MCMC is used to sample the posterior and that is a slow
process, we limit ourselves to a small number of iterations for the purposes
of this vignette.)
COMPASS
is designed to give verbose output with the model fitting statement,
in order for users to compare expectations in their data to what the COMPASS
model fitting function is doing, in order to minimize potential errors.
fit <- COMPASS( CC, treatment=trt == "Treatment", control=trt == "Control", iterations=100 )
After fitting the COMPASS
model, we can examine the output in many ways:
## Extract the functionality, polyfunctionality scores as described ## within the COMPASS paper -- these are measures of the overall level ## of 'functionality' of a cell, which has shown to be correlated with ## a cell's affinity in immune response FS <- FunctionalityScore(fit) PFS <- PolyfunctionalityScore(fit) ## Obtain the posterior difference, posterior log ratio from a COMPASSResult post <- Posterior(fit) ## Plot a heatmap of the mean probability of response, to visualize differences ## in expression for each category plot(fit) ## Visualize the posterior difference, log difference with a heatmap plot(fit, measure=PosteriorDiff(fit), threshold=0) plot(fit, measure=PosteriorLogDiff(fit), threshold=0)
COMPASS
also packages a Shiny application for interactive visualization of
the fits generated by a COMPASS
call. These can be easily generated through
a call to the shinyCOMPASS
function.
shinyCOMPASS(fit, stimulated="Treatment", unstimulated="Control")
flowWorkspace
is a package used for managing and generating gates for data
obtained from flow cytometry experiments. Combined with openCyto
, users are
able to automatically gate data through flexibly-defined gating templates.
COMPASS
comes with a utility function for extracting data from a
flowWorkspace
GatingSet
object, providing a seamless workflow between
both data management and analysis for polyfunctionality cytokine data.
Data can be extracted using COMPASSContainerFromGatingSet
, with appropriate
documentation available in ?COMPASSContainerFromGatingSet
. As an example,
a researcher might first gate his data in order to find CD4+
cells, and then
gate a number of markers, or cytokines, for these CD4+
cells, in order to
identify cells expressing different combinations of markers.
Users who have gated their data with flowJo
and are interested in analyzing
their data with COMPASS
can do so by first loading and parsing their
workspace with flowWorkspace
, and next generating the COMPASSContainer
through COMPASSContainerFromGatingSet
.
cite_package <- function(...) { tryCatch({ args <- unlist( list(...) ) cites <- lapply(args, citation) txt <- sapply(cites, function(x) { attr( unclass(x)[[1]], "textVersion" ) }) return(txt[order(txt)]) }, error=function(e) NULL ) } citations <- cite_package("COMPASS", "flowWorkspace", "openCyto", "base") citations <- c("Lin, L. Finak, G. Ushey, K. Seshadri C. et al. COMPASS identifies T-cell subsets correlated with clinical outcomes. Nature biotechnology (2015). doi:10.1038/nbt.3187", citations[1:2],"Greg Finak, Andrew McDavid, Pratip Chattopadhyay, Maria Dominguez, Steve De Rosa, Mario Roederer, Raphael Gottardo. Mixture models for single-cell assays with applications to vaccine studies. Biostatistics 2014 Jan;15(1):87-101",citations[3:4]) invisible(lapply(citations, function(x) cat(x, "\n\n")))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.