knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "75%" ) # Bioconductor vignette links link_sce <- paste0( "https://bioconductor.org/packages/release/bioc/", "vignettes/SingleCellExperiment/inst/doc/intro.html" ) link_se <- paste0( "https://bioconductor.org/packages/release/bioc/", "vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html" ) link_DESeq2 <- paste0( "https://bioconductor.org/packages/devel/bioc/", "vignettes/DESeq2/inst/doc/DESeq2.html" )
Single cell RNA sequencing (scRNA-seq) studies allow gene expression
quantification at the level of individual cells.
These studies introduce multiple layers of biological complexity, including
variations in gene expression between cell states within a sample
(e.g., T cells versus macrophages), between samples within a population
(e.g., biological or technical replicates), and between populations
(e.g., healthy versus diseased individuals). Many early scRNA-seq studies
involved analysis of gene expression within cells from a single sample.
For single cell RNA-seq data collected from more than one subject,
aggregateBioVar
provides tools to summarize summarize single cell gene
expression profiles at the level of samples (i.e., subjects) or
populations. Given an input SingleCellExperiment
object
[@SingleCellExperiment2020] with pre-defined cell states, aggregateBioVar()
stratifies data as a list of SummarizedExperiment
objects
[@R-SummarizedExperiment].
For each cell type, gene counts are aggregated by subject into a
gene-by-subject count matrix, and column metadata are summarized to retain
inter-subject variation for downstream analysis with bulk RNA-seq tools.
Install the development version of aggregateBioVar
from
GitHub with:
# install.packages("devtools") devtools::install_github("jasonratcliff/aggregateBioVar", build_vignettes=TRUE)
library(aggregateBioVar) # Bioconductor Packages library(SummarizedExperiment, quietly = TRUE) library(SingleCellExperiment, quietly = TRUE) library(DESeq2, quietly = TRUE) # Data analysis and visualization library(dplyr, quietly = TRUE) library(magrittr, quietly = TRUE) library(ggplot2, quietly = TRUE) library(ggtext, quietly = TRUE)
sce_samples <- sort(unique(small_airway$orig.ident)) sce_genotypes <- list( "Wildtype" = grep("WT", sce_samples, value = TRUE), "CFTRKO" = grep("CF", sce_samples, value = TRUE) )
To illustrate the utility of biological replication for scRNA-seq
sequencing experiments, consider a SingleCellExperiment
object with
scRNA-seq data from 7 subjects (r sce_samples
) in the context of a cystic
fibrosis phenotype. Samples were collected from small airway epithelium of
newborn Sus scrofa with genotypes from wild type
(CFTR+/+, n=r length(sce_genotypes$Wildtype)
) and CFTR-knockout
(CFTR-/-, n=r length(sce_genotypes$CFTRKO)
) individuals.
Note the dimensions of this object, with r nrow(small_airway)
genes from
r ncol(small_airway)
cells:
small_airway
The primary function aggregateBioVar()
takes a SingleCellExperiment
object with column metadata variables indicating subject identity
(e.g., biological sample; subjectVar
) and assigned cell type (cellVar
).
The column metadata of a SingleCellExperiment
object can be obtained by
SummarizedExperiment::colData()
.
Here, the metadata variable orig.ident
indicates the biological sample
identifier and celltype
the inferred cell type.
# Perform aggregation of counts and metadata by subject and cell type. aggregate_counts <- aggregateBioVar( scExp = small_airway, subjectVar = "orig.ident", cellVar = "celltype" )
Each element of the returned list contains a SummarizedExperiment
object with
aggregated counts from cells in the assigned cell type (indicated by cellVar
).
aggregate_counts
For each cell type subset, within-subject gene counts are aggregated and column
metadata are summarized to exclude variables with intercellular variation.
This effectively retains subject metadata and can be used for downstream
analysis with bulk RNA-seq tools. After aggregation, the number of columns
in the SingleCellExperiment
object matches the number of unique values in
the subject metadata variable indicated by subjectVar
.
assay(aggregate_counts$`Immune cell`, "counts")
colData(aggregate_counts$`Immune cell`)
The aggregate gene-by-subject matrix and subject metadata can be used as inputs
for bulk RNA-seq tools to investigate gene expression. Here, an example is
provided using DESeq2
[@DESeq2].
A DESeqDataSet
can be constructed from the aggregate gene-by-subject
count matrix and summarized column metadata.
subj_dds_dataset <- DESeqDataSetFromMatrix( countData = assay(aggregate_counts$`Secretory cell`, "counts"), colData = colData(aggregate_counts$`Secretory cell`), design = ~ Genotype ) subj_dds <- DESeq(subj_dds_dataset) subj_dds_results <- results(subj_dds, contrast = c("Genotype", "WT", "CFTRKO"))
Add negative log~10~ adjusted P-values, then plot against log~2~ fold change. Genes with adjusted P-values < 0.05 and fold-change absolute values > 1.0 are highlighted in red and labeled by feature.
subj_dds_transf <- as.data.frame(subj_dds_results) %>% bind_cols(feature = rownames(subj_dds_results)) %>% mutate(log_padj = - log(.data$padj, base = 10)) ggplot(data = subj_dds_transf) + geom_point(aes(x = log2FoldChange, y = log_padj), na.rm = TRUE) + geom_point( data = filter( .data = subj_dds_transf, abs(.data$log2FoldChange) > 1, .data$padj < 0.05 ), aes(x = log2FoldChange, y = log_padj), color = "red" ) + geom_label( data = filter( .data = subj_dds_transf, abs(.data$log2FoldChange) > 1, .data$padj < 0.05 ), aes(x = log2FoldChange, y = log_padj + 0.4, label = feature) ) + theme_classic() + labs( x = "log<sub>2</sub> (fold change)", y = "-log<sub>10</sub> (p<sub>adj</sub>)" ) + theme( axis.title.x = element_markdown(), axis.title.y = element_markdown())
For a detailed workflow and description of package components, see the package vignette:
vignette("multi-subject-scRNA-seq", package = "aggregateBioVar")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.