options(tinytex.verbose = TRUE)
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette describes how to use the SimpleCOMPASS
interface in the COMPASS
package to fit a polyfunctionality response model to tabular ICS data.
In contrast to COMPASS()
which is tied more closely to the GatingSet
data structures available in the core BioConductor flow cytometry infrastructure, SimpleCOMPASS()
allows the user to used tabular cell count data (exported from FlowJo, for example) directly for COMPASS
model fitting.
This approach requires a bit more care on the user's part, to ensure data are formatted and ordered correctly, but can can be less intimidating than working with the core flow infrastructure.
We'll assume that the data are provided as an excel
document. We'll use some of the data from the paper: @Rakshit2017-go. The data are available in the COMPASS package as of version 1.17.2.
We need a good package to load the excel file, we'll use the tidyverse pacakge readxl
. If it's not installed, you can install it with the command
library(BiocManager) # install.packages("readxl")
library(COMPASS) # library(readxl)
Let's get started.
The data used in this vignette are installed along with the COMPASS
package, and can be read into R as follows:
# retrieve the path to the data file in the package installation directory example_data_path = system.file("extdata", "SimpleCOMPASSExamples.csv", package = "COMPASS")
With the path to the data in hand we load the file.
compass_data = read.csv(example_data_path, check.names=FALSE)
Let's start by looking at the input data:
dim(compass_data) colnames(compass_data)[1:5]
We see we have a table of 42 rows by 260 columns. The rows correspond to subjects identified by the PATIENT ID
column under different Stimulation
conditions from different treatment GROUP
s. There are 256 ($2^8$) different boolean cell combinations defined by 8 different cytokine markers.
A peek at the data shows us that the first three columns correspond to metadata
, while the remaining columns are integer
cell counts (NOTE: Not proportions, COMPASS
and SimpleCOMPASS
require cell counts).
We'll need to do several things to prepare the data to fit the model.
metadata
rows to count data rows.This seems like a lot of work, but it's relatively straightforward. When using the COMPASS
interface, this is all handled for the user.
We assign the first three columns of the data set to a new metadata
variable. These are the PATIENT ID
, Stimulation
and Group
variables. We also want metadata
to be a data.frame
rather than a tibble
.
metadata = compass_data[,2:4] metadata = as.data.frame(metadata)
# Take the remaining columns counts = as.matrix(compass_data[,5:261]) dim(counts) # NOTE that we still have too many columns. # We should have 256, not 257. # We'll fix this. The first column is the total. We'd need this for COMPASS, but not for SimpleCOMPASS. # drop the total count counts = counts[,-1L] dim(counts)
Read the code comments above, there are some imporant details there.
Next we split the counts matrix, but we notice that we have a typo in our metadata Stimulation
variable, we'll fix that and proceed.
We assign the stimulated counts to the new n_s
variable and the unstimulated counts to the n_u
variable.
unique(metadata$Stimulation) # Which entries have the typos typos = which(metadata$Stimulation=="UnStimulated") #Correct them metadata$Stimulation[typos] = "Unstimulated" # Better unique(metadata$Stimulation) # old school R n_u = subset(counts, metadata$Stimulation == "Unstimulated") n_s = subset(counts, metadata$Stimulation == "ESAT6/CFP10")
In this case, this is simple as each subject corresponds to a unique sample. In more complex cases where there are multiple timepoints, we would construct a unique identifier by concatenating the visit and the subject ids.
metadata$unique_id = metadata$`PATIENT ID`
The rows of metadata
, n_s
and n_u
are all consistent, although metadata
has twice as many rows as n_s
or n_u
since we split the counts matrix.
# assign consistent row names to n_u and n_s rownames(n_u) = subset(metadata$unique_id, metadata$Stimulation=="Unstimulated") rownames(n_s) = subset(metadata$unique_id, metadata$Stimulation=="ESAT6/CFP10") # Now all matrices have the same dimensions and appropriate rownames metadata = subset(metadata, metadata$Stimulation=="ESAT6/CFP10")
We have subset metadata to include only the stimulated rows. We have named the columns of the split count matrices according to the unique ids of the metadata matrix entries for the stimulated or non-stimulated entries, as appropriate.
Let's look at the current population names.
colnames(n_s)[1]
There is a gating path prefix (we won't need that), and a trailing ",Count" string, presumably to specify that these are count values as oppposed to proportions. We won't need that either.
The cytokine names are in a short form (i.e. 17F corresponds to IL17F), and they use the "+/-" naming scheme to indicate cells that are positive or negative for a particular cytokine. This is not the format required by COMPASS
.
We need to drop the path prefix, and use boolean operators to specify the cell combinations. For example "A+/B+C-" would translate to "A&B&!C", read as "A and B and NOT C".
COMPASS provides a convenience function to do this for you translate_marker_names()
.
# remove the path nms = basename(colnames(n_s)) # translate the marker names to a COMPASS compatible format. nms = COMPASS:::translate_marker_names(nms) sample(nms,2) colnames(n_s) = nms nms = basename(colnames(n_u)) # translate the marker names to a COMPASS compatible format. nms = COMPASS:::translate_marker_names(nms) sample(nms,2) colnames(n_u) = nms
If we want, we can rename the cytokines to familiar form.
#2 to IL2 colnames(n_s) = gsub("([&!])2([&!])","\\1IL2\\2",colnames(n_s)) #10 to IL10 colnames(n_s) = gsub("([&!])10([&!])","\\1IL10\\2",colnames(n_s)) # 17A to IL17A colnames(n_s) = gsub("([&!])17A([&!])","\\1IL17A\\2",colnames(n_s)) # 17F to IL17F colnames(n_s) = gsub("([&!])17F([&!])","\\1IL17F\\2",colnames(n_s)) # 22 to IL22 colnames(n_s) = gsub("([&!])22([&!])","\\1IL22\\2",colnames(n_s))
We can put the above in a function:
rename_cytokines = function(nms){ #2 to IL2 nms = gsub("([&!])2([&!])", "\\1IL2\\2", nms) #10 to IL10 nms = gsub("([&!])10([&!])", "\\1IL10\\2", nms) # 17A to IL17A nms = gsub("([&!])17A([&!])", "\\1IL17A\\2", nms) # 17F to IL17F nms = gsub("([&!])17F([&!])", "\\1IL17F\\2", nms) # 22 to IL22 nms = gsub("([&!])22([&!])", "\\1IL22\\2", nms) }
And apply it as so:
colnames(n_s) = rename_cytokines(colnames(n_s)) colnames(n_u) = rename_cytokines(colnames(n_u))
The last column of the count matrices should be the all-negative cell subset. Let's confirm that is is.
colnames(n_s)[ncol(n_s)] colnames(n_u)[ncol(n_u)]
The arguments required by SimpleCOMPASS
are:
args(COMPASS::SimpleCOMPASS)
We have already defined n_s
, n_u
, and meta
(metadata). The individual_id
is a character name of the variable for the unique_id, in our case "unique_id".
We'll limit the number of iterations and replications for the purpose of the vignette, so we won't have a good fit, but it's only for demonstration.
If you see a message about reordering the rows of n_s and n_u, double check your work, it likely means that the rownames of the count matrices don't match the unique_id
variable, or are not consistent between each other.
fit = COMPASS::SimpleCOMPASS(n_s = n_s, n_u = n_u, meta = metadata, individual_id = "unique_id", iterations = 1000, replications = 3)
For a complete run, iterations should be 400000, and replications should be 8 (the defaults for COMPASS
).
We can extract polyfunctionality scores with the scores()
function. We look at the distribution of the functionality score (FS) across groups.
library(ggplot2) library(dplyr) scores(fit) %>% ggplot() + geom_boxplot(outlier.colour = NA) + geom_jitter() + aes(y = FS, x = GROUP)
A heatmap can be drawn using plot()
.
plot(fit,"GROUP")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.