In COMPASS: Combinatorial Polyfunctionality Analysis of Single Cells

COMPASS - Combinatorial Polyfunctionality Analysis of Single Cells

Kevin Ushey, Lynn Lin, and Greg Finak
r Sys.Date()

Introduction

Rapid advances in flow cytometry and other single-cell technologies have enabled high-dimensional, multi-parameter, high-throughput measurements of individual cells. Numerous questions about cell population heterogeneity can now be addressed, as these novel technologies permit single-cell analysis of antigen-specific T-cells. Unfortunately, there is a lack of computational tools to take full advantage of these complex data. COMPASS is a statistical framework that enables unbiased analysis of antigen-specific T-cell subsets. COMPASS uses a Bayesian hierarchical framework to model all observed cell-subsets and select the most likely to be antigen-specific while regularizing the small cell counts that often arise in multi-parameter space. The model provides a posterior probability of specificity for each cell subset and each sample, which can be used to profile a subject’s immune response to external stimuli such as infection or vaccination.

Example

We will use a simulated dataset to illustrate the use of the COMPASS package.

First, we will outline the components used to construct the COMPASSContainer, the data structure used to hold data from an ICS experiment. We will first initialize some parameters used in generating the simulated data.

library(COMPASS)
set.seed(123)
n <- 100 ## number of samples
k <- 6 ## number of markers

sid_vec <- paste0("sid_", 1:n) ## sample ids; unique names used to denote samples
iid_vec <- rep_len( paste0("iid_", 1:(n/10) ), n ) ## individual ids

The COMPASSContainer is built out of three main components: the data, the total counts, and the metadata. We will describe the R structure of these components next.

data

This is a list of matrices, with each matrix representing a population of cells drawn from a particular sample. Each row is an individual cell, each column is a marker (cytokine), and each matrix element is an intensity measure associated with a particular cell and marker. Only cells that express at least one marker are included.

data <- replicate(n, {
  nrow <- round(runif(1) * 1E4 + 1000)
  ncol <- k
  vals <- rexp( nrow * ncol, runif(1, 1E-5, 1E-3) )
  vals[ vals < 2000 ] <- 0
  output <- matrix(vals, nrow, ncol)
  output <- output[ apply(output, 1, sum) > 0, ]
  colnames(output) <- paste0("M", 1:k)
  return(output)
})
names(data) <- sid_vec
head( data[[1]] )

counts

This is a named integer vector, denoting the total number of cells that were recovered from each sample in data.

counts <- sapply(data, nrow) + round( rnorm(n, 1E4, 1E3) )
counts <- setNames( as.integer(counts), names(counts) )
head(counts)

metadata

This is a data.frame that associates metadata information with each sample available in data / counts. We will suppose that each sample was subject to one of two treatments named Control and Treatment.

meta <- data.frame(
  sid=sid_vec,
  iid=iid_vec,
  trt=sample( c("Control", "Treatment"), n, TRUE )
)
head(meta)

Once we have these components, we can construct the COMPASSContainer:

CC <- COMPASSContainer(
  data=data,
  counts=counts,
  meta=meta,
  individual_id="iid",
  sample_id="sid"
)

We can see some basic information about our COMPASSContainer:

CC
summary(CC)

Fitting the COMPASS model is very easy once the data has been inserted into the COMPASSContainer object. To specify the model, the user needs to specify the criteria that identify samples that received a positive stimulation, and samples that received a negative stimulation. In our data, we can specify it as follows. (Because MCMC is used to sample the posterior and that is a slow process, we limit ourselves to a small number of iterations for the purposes of this vignette.)

COMPASS is designed to give verbose output with the model fitting statement, in order for users to compare expectations in their data to what the COMPASS model fitting function is doing, in order to minimize potential errors.

fit <- COMPASS( CC,
  treatment=trt == "Treatment",
  control=trt == "Control",
  iterations=100
)

After fitting the COMPASS model, we can examine the output in many ways:

## Extract the functionality, polyfunctionality scores as described
## within the COMPASS paper -- these are measures of the overall level
## of 'functionality' of a cell, which has shown to be correlated with
## a cell's affinity in immune response
FS <- FunctionalityScore(fit)
PFS <- PolyfunctionalityScore(fit)

## Obtain the posterior difference, posterior log ratio from a COMPASSResult
post <- Posterior(fit)

## Plot a heatmap of the mean probability of response, to visualize differences 
## in expression for each category
plot(fit)

## Visualize the posterior difference, log difference with a heatmap
plot(fit, measure=PosteriorDiff(fit), threshold=0)
plot(fit, measure=PosteriorLogDiff(fit), threshold=0)

COMPASS also packages a Shiny application for interactive visualization of the fits generated by a COMPASS call. These can be easily generated through a call to the shinyCOMPASS function.

shinyCOMPASS(fit, stimulated="Treatment", unstimulated="Control")

Interoperation with flowWorkspace

flowWorkspace is a package used for managing and generating gates for data obtained from flow cytometry experiments. Combined with openCyto, users are able to automatically gate data through flexibly-defined gating templates.

COMPASS comes with a utility function for extracting data from a flowWorkspace GatingSet object, providing a seamless workflow between both data management and analysis for polyfunctionality cytokine data. Data can be extracted using COMPASSContainerFromGatingSet, with appropriate documentation available in ?COMPASSContainerFromGatingSet. As an example, a researcher might first gate his data in order to find CD4+ cells, and then gate a number of markers, or cytokines, for these CD4+ cells, in order to identify cells expressing different combinations of markers.

Users who have gated their data with flowJo and are interested in analyzing their data with COMPASS can do so by first loading and parsing their workspace with flowWorkspace, and next generating the COMPASSContainer through COMPASSContainerFromGatingSet.

Citations

cite_package <- function(...) {
  tryCatch({
    args <- unlist( list(...) )
    cites <- lapply(args, citation)
    txt <- sapply(cites, function(x) {
      attr( unclass(x)[[1]], "textVersion" )
    })
    return(txt[order(txt)])
  }, error=function(e) NULL
  )
}
citations <- cite_package("COMPASS", "flowWorkspace", "openCyto", "base")
citations <- c("Lin, L. Finak, G. Ushey, K. Seshadri C. et al. COMPASS identifies T-cell subsets correlated with clinical outcomes. Nature biotechnology (2015). doi:10.1038/nbt.3187", citations[1:2],"Greg Finak, Andrew McDavid, Pratip Chattopadhyay, Maria Dominguez, Steve De Rosa, Mario Roederer, Raphael Gottardo. Mixture models for single-cell assays with applications to vaccine studies. Biostatistics 2014 Jan;15(1):87-101",citations[3:4])
invisible(lapply(citations, function(x) cat(x, "\n\n")))