Zanotelli_2020_Spheroids: Obtain the Zanotelli_2020_Spheroids dataset

Zanotelli_2020_SpheroidsR Documentation

Obtain the Zanotelli_2020_Spheroids dataset

Description

Obtain the Zanotelli_2020_Spheroids dataset, which consists of three data objects: single cell data, multichannel images and cell segmentation masks. The data were obtained by imaging mass cytometry (IMC) of sections of 3D spheroids generated from different cell lines.

Usage

Zanotelli_2020_Spheroids(
  data_type = c("sce", "spe", "images", "masks"),
  version = "latest",
  metadata = FALSE,
  on_disk = FALSE,
  h5FilesPath = NULL,
  force = FALSE
)

Arguments

data_type

type of object to load, 'images' for multichannel images or 'masks' for cell segmentation masks. Single cell data are retrieved using either 'sce' for the SingleCellExperiment format or 'spe' for the SpatialExperiment format.

version

dataset version. By default, the latest version is returned.

metadata

if FALSE (default), the data object selected in data_type is returned. If TRUE, only the metadata associated to this object is returned.

on_disk

logical indicating if images in form of HDF5Array objects (as .h5 files) should be stored on disk rather than in memory. This setting is valid when downloading images and masks.

h5FilesPath

path to where the .h5 files for on disk representation are stored. This path needs to be defined when on_disk = TRUE. When files should only temporarily be stored on disk, please set h5FilesPath = getHDF5DumpDir().

force

logical indicating if images should be overwritten when files with the same name already exist on disk.

Details

This is an Imaging Mass Cytometry (IMC) dataset from Zanotelli et al. (2020), consisting of three data objects:

  • images contains 517 multichannel images, each containing 51 channels, in the form of a CytoImageList class object.

  • masks contains the cell segmentation masks associated with the images, in the form of a CytoImageList class object.

  • sce contains the single cell data extracted from the multichannel images using the cell segmentation masks, as well as the associated metadata, in the form of a SingleCellExperiment. This represents a total of 229,047 cells x 51 channels.

  • spe same single cell data as for sce, but in the SpatialExperiment format.

All data are downloaded from ExperimentHub and cached for local re-use.

Mapping between the three data objects is performed via variables located in their metadata columns: mcols() for the CytoImageList objects and ColData() for the SingleCellExperiment and SpatialExperiment objects. Mapping at the image level can be performed with the image_name or image_number variables. Mapping between cell segmentation masks and single cell data is performed with the cell_number variable, the values of which correspond to the intensity values of the masks object. For practical examples, please refer to the "Accessing IMC datasets" vignette.

This dataset was obtained as following (the names of the experimental variables, located in the colData of the SingleCellExperiment and SpatialExperiment objects, are indicated in parentheses): i) Cells from four different cell lines (cell_line) were seeded at three different densities (treatment_concentration, relative densities) and grown for either 72 or 96 hours (treatment_time_point, duration in hours). In the appropriate experimental conditions (see the paper for details), the cells aggregate into 3D spheroids. ii) Cells were harvested and pooled into 60-well barcoding plates. iii) A pellet of each spheroid pool was generated and cut into several 6 um-thick sections. iv) A subset of these sections (site_id) were stained with an IMC panel and acquired as one or more acquisitions (acquisition_id) containing multiple spheres each. v) Spheres in these acquisitions were identified by computer vision and cropped into individual images (image_number).

Other relevant cell metadata include:

  • treatment_name: experimental conditions in the format: "Cell line name"_c"seeding density"_tp"time point".

  • cell_x/cell_y: cell centroid position in the image.

  • cell_area: area of the cell (um^2).

  • distance_rim: estimated distance to spheroid border.

  • distance_sphere: distance to spheroid section border.

  • distance_other_sphere: distance to the closest of the other spheroid sections in the same image (if there is any).

  • distance_background: distance to background pixels.

For a full description of the other experimental variables, please refer to the publication (https://doi.org/10.15252/msb.20209798) and to the original dataset repository (https://doi.org/10.5281/zenodo.4271910).

The marker-associated metadata, including antibody information and metal tags are stored in the rowData of the SingleCellExperiment and SpatialExperiment objects. The channels with names starting with "BC_" are the channels used for barcoding. Post-transcriptional modification of the protein targets are indicated in brackets.

The assay slots of the SingleCellExperiment and SpatialExperiment objects contain three assays:

  • counts contains raw mean ion counts per cell.

  • exprs contains arsinh-transformed counts, with cofactor 1.

  • quant_norm contains counts censored at the 99th percentile and scaled 0-1.

In addition, the altExp slot of the SingleCellExperiment object contains another SingleCellExperiment object where the counts matrix represents raw mean ion counts for cells neighboring the current cell.

Neighborhood information, defined here as cells that are localized next to each other, is stored as a SelfHits object in the colPairs slot of the SingleCellExperiment and SpatialExperiment objects. Cells in the SelfHits object are represented by unique integers that map to the cell_number_absolute column of colData(sce).

Dataset versions: a version argument can be passed to the function to specify which dataset version should be retrieved.

  • `v0`: original version (Bioconductor <= 3.15).

  • `v1`: consistent object formatting across datasets.

File sizes:

  • `images`: size in memory = 21.2 Gb, size on disk = 860 Mb.

  • `masks`: size in memory = 426 Mb, size on disk = 12 Mb.

  • `sce`: size in memory = 564 Mb, size on disk = 319 Mb.

  • `spe`: size in memory = 596 Mb, size on disk = 320 Mb.

When storing images on disk, these need to be first fully read into memory before writing them to disk. This means the process of downloading the data is slower than directly keeping them in memory. However, downstream analysis will lose its memory overhead when storing images on disk.

Original source: Zanotelli et al. (2020): https://doi.org/10.15252/msb.20209798

Original link to raw data, also containing the entire dataset: https://doi.org/10.5281/zenodo.4271910

Value

A SingleCellExperiment object with single cell data, a SpatialExperiment object with single cell data, a CytoImageList object containing multichannel images, or a CytoImageList object containing cell segmentation masks.

Author(s)

Nicolas Damond

References

Zanotelli VRT et al. (2020). A quantitative analysis of the interplay of environment, neighborhood, and cell state in 3D spheroids Mol Syst Biol 16(12), e9798.

Examples

# Load single cell data
sce <- Zanotelli_2020_Spheroids(data_type = "sce")
print(sce)

# Display metadata
Zanotelli_2020_Spheroids(data_type = "sce", metadata = TRUE)

# Load masks on disk
library(HDF5Array)
masks <- Zanotelli_2020_Spheroids(data_type = "masks", on_disk = TRUE,
h5FilesPath = getHDF5DumpDir())
print(head(masks))


BodenmillerGroup/imcdatasets documentation built on March 20, 2024, 9:24 a.m.