knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE)
The r Biocpkg("MouseThymusAgeing")
package provides convenient access to the single-cell RNA sequencing (scRNA-seq) datasets from @baran-gale_ageing_2020. The study used single-cell transcriptomic profiling to resolve how the epithelial composition of the mouse thymus
changes with ageing. The datasets from the paper are provided as count matrices with relevant sample-level and feature-level meta-data. All data
are provided post-processing and QC. The raw sequencing data can be directly acquired from ArrayExpress using accessions
E-MTAB-8560 and E-MTAB-8737.
The package can be installed from Bioconductor. Bioconductor packages can be accessed using the r CRANpkg("BiocManager")
package.
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("MouseThymusAgeing")
To use the package, load it in the typical way.
library(MouseThymusAgeing)
Detailed experimental protocols are available in the manuscript and analytical details are provided in the accompanying GitHub repo.
This data package contains 2 single-cell data sets from the paper. The first details the initial transcriptomic profiling of defined TEC populations using the plate-based SMART-seq2 chemistry. These cells were sorted from mice at 1, 4, 16, 32 and 52 weeks of age using the following flow cytometry phenotypes:
In each case cells were sorted from 5 separate mice at each age into a 384 well plate containing lysis buffer, with cells from different ages and days block sorted into different areas of each plate to minimise the confounding between batch effects, mouse age and sorted subpopulation. The single-cell libraries were prepared according to the SMART-seq2 protocol and sequenced on an Illumina NovaSeq 6000.
The computational processing invovled the following steps:
computeSumFactors()
function from r Biocpkg("scran")
[@l._lun_pooling_2016].r CRANpkg("igraph")
and cells were clustered using the Walktrap community
detection algorithm [@pons_computing_2005]. Clusters were manually annotated based on inspecting the expression of marker genes.The second dataset contains cells that were profiling from TEC at 8, 20 and 36 weeks old, derived from a transgenic model system that is also able to lineage trace cells that derive from those that express the thymoproteasomal gene, $\beta$-5t. When this gene is expressed it drives the expression of a fluorescent reporter gene, ZsGreen (ZsG). The mouse is denoted $\mbox{3xtg}^{\beta5t}$. Each mouse (3 replicates per age) first had their transgene induced using doxycycline, and 4 weeks later the TEC were collected by flow cytometry in separate ZsG+ and ZsG- groups. Within each of these groups cells were FAC-sorted into mTEC (Cd45+EpCam+MHCII+Ly51-UEA1+) and cTEC (Cd45+EpCam+Ly51+UEA1+) populations. For this experiment we made us of recent developments in multiplexing with hashtag oligos (HTO; cell-hashing)[@stoeckius_cell_2018]. Consequently, the cells were super-loaded onto the 10X Genomics Chromium chips before library prep and sequencing on an Illumina NovaSeq 6000.
The computational processing for these data is different to above. Specifically:
emptyDrops()
from the r Biocpkg("DropletUtils")
[@lun_emptydrops_2019].computeSumFactors()
function from r Biocpkg("scran")
[@l._lun_pooling_2016], and used for
normalization with a log(X + 1)
transformation.r CRANpkg("igraph")
as above, and cell were also clustered using Walktrap community detection algorithm
[@pons_computing_2005]. These clusters were annotated with concordant labels from the above data set. The exception being that many more clusters
were identified, and thus each cluster was suffixed with a number to uniquely identify them.The SMART-seq2 data is stored in subsets according to the sorting day (numbered 1-5). For the droplet data, the data can be accessed according
to the specific multiplexed samples (6 in total). For the SMART-seq2 the exported object SMARTseqMetadata
provides the relevant metadata
information for each sorting day, the equivalent object DropletMetadata
contains the relevant information for each separate sample. Specific
descriptions of each column can be accessed using ?SMARTseqMetadata
and ?DropletMetadata
.
head(SMARTseqMetadata, n = 5)
All of the data access functions allow you to select the particular samples or sorting days that you would like to access for the relevant data set. By loading only the samples or sorting days that you are interested in for your particular analysis, you will save time when downloading and loading the data, and also reduce memory consumption on your machine.
Droplet single-cell experiments tend to be much larger owing to the ability to encapsulate and process many more cells than in either 96- or 384-well plates. The droplet scRNA-seq made use of hashtag oligonucleotides to multiplex samples, allowing for replicated experimental design without breaking the bank.
head(DropletMetadata, n = 5)
Package data are provided as SingleCellExperiment
objects, an extension of the Bioconductor SummarizedExperiment
object for high-throughput
omics experiment data. SingleCellExperiment
object uses memory-efficient storage and sparse matrices to store the single-cell experiment data,
whilst allowing the layering of additional feature- and cell-wise meta-data to facilitate single-cell analyses. This section will detail how
to access and interact with these objects from the MouseThymusAgeing
package.
smart.sce <- MouseSMARTseqData(samples="day2") smart.sce
The gene counts are stored in the assays(sce, "counts")
slot, which can be accessed using the convenience function counts
. The gene counts are
stored in a memory efficient sparse matrix class from the r CRANpkg("Matrix")
package.
head(counts(smart.sce)[, 1:10])
The normalisation factors per cell can be accessed using the sizeFactors()
function.
head(sizeFactors((smart.sce)))
These are used to normalise the data. To generate single-cell expression values on a log-normal scale, we can apply the logNormCounts
from the
r Biocpkg("scuttle")
package. This will add the logcounts
entry to the assays
slot in our object.
library(scuttle) smart.sce <- logNormCounts(smart.sce)
With these normalised counts we can perform our standard down-stream analytical tasks, such as identifying highly variable genes, projecting
cells into a reduced dimensional space and clustering using a nearest-neighbour graph. You can further inspect the cell-wise meta-data attached
to each dataset, stored in the colData
for each r Biocpkg("SingleCellExperiment")
object.
head(colData(smart.sce))
Details of what information is stored can be found in the documentation using ?DropletMetadata
and ?SMARTseqMetada
. In each object we also
have the pre-computed reduced dimensions that can be accessed through the reducedDim(<sce>, "PCA")
slot.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.