#' Obtain the IMMUcan_2022_CancerExample dataset
#'
#' Obtain the IMMUcan_2022_CancerExample dataset, which consists of three data
#' objects: single cell data, multichannel images and cell segmentation masks.
#' Data were obtained by imaging mass cytometry (IMC) of sections of 4 patients
#' with different tumor indications.
#'
#' @param data_type type of object to load, `images` for multichannel images or
#' `masks` for cell segmentation masks. Single cell data are retrieved using
#' either `sce` for the \code{SingleCellExperiment} format or `spe` for the
#' \code{SpatialExperiment} format.
#' @param version dataset version. By default, the latest version is returned.
#' @param metadata if FALSE (default), the data object selected in
#' \code{data_type} is returned. If TRUE, only the metadata associated to this
#' object is returned.
#' @param on_disk logical indicating if images in form of
#' \linkS4class{HDF5Array} objects (as .h5 files) should be stored on disk
#' rather than in memory. This setting is valid when downloading \code{images}
#' and \code{masks}.
#' @param h5FilesPath path to where the .h5 files for on disk representation
#' are stored. This path needs to be defined when \code{on_disk = TRUE}. When
#' files should only temporarily be stored on disk, please set
#' \code{h5FilesPath = getHDF5DumpDir()}.
#' @param force logical indicating if images should be overwritten when files
#' with the same name already exist on disk.
#'
#' @details This is an Imaging Mass Cytometry (IMC) dataset used in the
#' \href{https://bodenmillergroup.github.io/IMCDataAnalysis/}{IMC data analysis
#' book} \itemize{ \item \code{images} contains 14 multichannel images, each
#' containing 50 channels, in the form of a \linkS4class{CytoImageList} class
#' object. \item \code{masks} contains the cell segmentation masks associated
#' with the images, in the form of a \linkS4class{CytoImageList} class object.
#' \item \code{sce} contains the single cell data extracted from the
#' multichannel images using the cell segmentation masks, as well as the
#' associated metadata, in the form of a \linkS4class{SingleCellExperiment}
#' object. Single cell data can also be retrieved as a
#' \linkS4class{SpatialExperiment} object. This represents a total of 46,825
#' cells x 40 channels. }
#'
#' All data are downloaded from ExperimentHub and cached for local re-use.
#'
#' Mapping between the three data objects is performed via variables located in
#' their metadata columns: \code{mcols()} for the \linkS4class{CytoImageList}
#' objects and \code{ColData()} for the \linkS4class{SingleCellExperiment}
#' object. Mapping at the image level can be performed with the
#' \code{sample_id} or \code{image_name} variables. Mapping between cell
#' segmentation masks and single cell data is performed with the
#' \code{cell_number} variable, the values of which correspond to the
#' intensity values of the \code{masks} object. For practical examples, please
#' refer to the "Accessing IMC datasets" vignette.
#'
#' This imaging mass cytometry dataset serves as an example to demonstrate
#' downstream analysis tools including spatial data analysis. The data was
#' generated as part of the Integrated iMMUnoprofiling of large adaptive CANcer
#' patient cohorts (IMMUcan) project (\href{immucan.eu}{immucan.eu}) using the
#' Hyperion imaging system.
#'
#' Relevant entries to the \code{colData} slot are as follows:
#'
#' \itemize{
#' \item \code{sample_id} image name.
#' \item \code{cell_number} cell identifier.
#' \item \code{width_px} width of the image.
#' \item \code{height_px} height of the image.
#' \item \code{patient_id} patient identifier.
#' \item \code{ROI} region of interest identifier.
#' \item \code{indication} cancer type.
#' \item \code{cell_labels} labels of manually labelled cells.
#' \item \code{cell_type} cell type as defined by classification.
#' \item \code{spatial_community} identifiers of each spatial tumor or
#' non-tumor community
#' \item \code{cn_celltypes} cellular neighborhoods as defined by
#' clustering cells based on the frequency of neighboring cell types.
#' \item \code{cn_expression} cellular neighborhoods as defined by
#' clustering cells based on the mean expression of neighboring cells
#' \item \code{lisa_clusters} cellular neighborhoods as detected by the
#' lisaClust package.
#' \item \code{spatial_context} spatial contexts defined in
#' \code{cn_celltype}.
#' \item \code{spatial_context_filtered} filtered spatial context
#' identifiers.
#' \item \code{patch_id} identifier of the spatial tumor patch.
#' \item \code{cell_x} spatial x coordinate.
#' \item \code{cell_y} spatial y coordinate.
#'}
#'
#' The marker-associated metadata, including antibody information and metal
#' tags are stored in the \code{rowData} of the
#' \linkS4class{SingleCellExperiment} object.
#'
#' The \code{assay} slot of the \linkS4class{SingleCellExperiment} object
#' contains two assays:
#'
#' \itemize{
#' \item \code{counts}: mean ion counts per cell
#' \item \code{exprs}: arsinh-transformed counts per cell, with cofactor 1.
#' }
#'
#' The \code{colPair} slot of the \linkS4class{SingleCellExperiment} object
#' contains the following spatial object graphs:
#'
#' \itemize{
#' \item \code{neighborhood} steinbock generated graph.
#' \item \code{knn_interaction_graph} 20-nearest neighbor graph.
#' \item \code{expansion_interaction_graph} expansion graph using a threshold
#' of 20.
#' \item \code{delaunay_interaction_graph} interaction graph constructed by
#' delaunay triangulation.
#' \item \code{knn_spatialcontext_graph} 40-nearest neighbor graph.
#' }
#'
#' File sizes:
#' \itemize{
#' \item \code{`images`}: size in memory = 1.5 Gb, size on disk = 786 Mb.
#' \item \code{`masks`}: size in memory = 19 Mb, size on disk = 1.2 Mb.
#' \item \code{`sce`}: size in memory = 182 Mb, size on disk = 82 Mb.
#' \item \code{`spe`}: size in memory = 183 Mb, size on disk = 81 Mb.
#' }
#'
#' When storing images on disk, these need to be first fully read into memory
#' before writing them to disk. This means the process of downloading the data
#' is slower than directly keeping them in memory. However, downstream analysis
#' will lose its memory overhead when storing images on disk.
#'
#' @return A \linkS4class{SingleCellExperiment} object with single cell data, a
#' \linkS4class{CytoImageList} object containing multichannel images, or a
#' \linkS4class{CytoImageList} object containing cell segmentation masks.
#'
#' @author Nils Eling
#'
#' @examples
#' # Load single cell data
#' sce <- IMMUcan_2022_CancerExample(data_type = "sce")
#' print(sce)
#'
#' # Display metadata
#' IMMUcan_2022_CancerExample(data_type = "sce", metadata = TRUE)
#'
#' # Load masks on disk
#' library(HDF5Array)
#' masks <- IMMUcan_2022_CancerExample(data_type = "masks", on_disk = TRUE,
#' h5FilesPath = getHDF5DumpDir())
#' print(head(masks))
#'
#' @import cytomapper
#' @import methods
#' @importFrom utils download.file
#' @importFrom utils read.csv
#' @importFrom ExperimentHub ExperimentHub
#' @importFrom SingleCellExperiment SingleCellExperiment
#' @importFrom SpatialExperiment SpatialExperiment
#' @importFrom HDF5Array writeHDF5Array
#' @importFrom DelayedArray DelayedArray
#'
#' @export
IMMUcan_2022_CancerExample <- function (
data_type = c("sce", "spe", "images", "masks"),
version = "latest",
metadata = FALSE,
on_disk = FALSE,
h5FilesPath = NULL,
force = FALSE
) {
available_versions <- c("v1")
dataset_name <- "IMMUcan_2022_CancerExample"
dataset_version <- ifelse(version == "latest",
utils::tail(available_versions, n=1), version)
full_dataset <- FALSE
.checkArguments(data_type, metadata, dataset_version, available_versions,
full_dataset, on_disk, h5FilesPath, force)
cur_dat <- .loadDataObject(data_type, metadata, dataset_name,
dataset_version, full_dataset, on_disk, h5FilesPath, force)
return(cur_dat)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.