dittoHeatmap: Outputs a heatmap of given genes

View source: R/dittoHeatmap.R

dittoHeatmapR Documentation

Outputs a heatmap of given genes

Description

Given a set of genes, cells/samples, and metadata names for column annotations, this function will retrieve the expression data for those genes and cells, and the annotation data for those cells. It will then utilize these data to make a heatmap using the pheatmap function of either the pheatmap (default) or ComplexHeatmap package.

Usage

dittoHeatmap(
  object,
  genes = getGenes(object, assay),
  metas = NULL,
  cells.use = NULL,
  annot.by = NULL,
  order.by = .default_order(object, annot.by),
  main = NA,
  cell.names.meta = NULL,
  assay = .default_assay(object),
  slot = .default_slot(object),
  swap.rownames = NULL,
  heatmap.colors = colorRampPalette(c("blue", "white", "red"))(50),
  scaled.to.max = FALSE,
  heatmap.colors.max.scaled = colorRampPalette(c("white", "red"))(25),
  annot.colors = c(dittoColors(), dittoColors(1)[seq_len(7)]),
  annotation_col = NULL,
  annotation_colors = NULL,
  data.out = FALSE,
  highlight.features = NULL,
  show_colnames = isBulk(object),
  show_rownames = TRUE,
  scale = "row",
  cluster_cols = isBulk(object),
  border_color = NA,
  legend_breaks = NA,
  drop_levels = FALSE,
  breaks = NA,
  complex = FALSE,
  ...
)

Arguments

object

A Seurat, SingleCellExperiment, or SummarizedExperiment object.

genes

String vector, c("gene1","gene2","gene3",...) = the list of genes to put in the heatmap. If not provided, defaults to all genes of the object / assay.

metas

String vector, c("meta1","meta2","meta3",...) = the list of metadata variables to put in the heatmap.

cells.use

String vector of cells'/samples' names OR an integer vector specifying the indices of cells/samples which should be included.

Alternatively, a Logical vector, the same length as the number of cells in the object, which sets which cells to include.

annot.by

String name of any metadata slots containing how the cells/samples should be annotated.

order.by

Single string, string vector, or numeric vector which sets how cells/samples (columns) will be ordered when cluster_cols = FALSE.

Strings should be the name of a gene, or metadata slot, but can also be multiple such values in order of priority.

Alternatively, can be a numeric vector which gives the column index order directly.

main

String that sets the title for the heatmap.

cell.names.meta

quoted "name" of a meta.data slot to use for naming the columns instead of using the raw cell/sample names.

assay, slot

single strings or integers (SCEs and SEs) or an optionally named vector of such values that set which expression data to use. See GeneTargeting for specifics and examples – Seurat and SingleCellExperiment objects deal with these differently, and functionality additions in dittoSeq have led to some minimal divergence from the native methodologies.

swap.rownames

optionally named string or string vector. For SummarizedExperiment or SingleCellExperiment objects, its value(s) specifies the column name of rowData(object) to be used to identify features instead of rownames(object). When targeting multiple modalities (alternative experiments), names can be used to specify which level / alternative experiment (use 'main' for the top-level) individual values should be used for. See GeneTargeting for more specifics and examples.

heatmap.colors

the colors to use within the heatmap when (default setting) scaled.to.max is set to FALSE. Default is a ramp from navy to white to red with 50 slices.

scaled.to.max

Logical, FALSE by default, which sets whether expression shoud be scaled between [0, 1]. This is recommended for single-cell datasets as they are generally enriched in 0s.

heatmap.colors.max.scaled

the colors to use within the heatmap when scaled.to.max is set to TRUE. Default is a ramp from white to red with 25 slices.

annot.colors

String (color) vector where each color will be assigned to an individual annotation in the generated annotation bars.

data.out

Logical. When set to TRUE, changes the output from the heatmat itself, to a list containing all arguments that would have be passed to pheatmap for heatmap generation. (Can be useful for troubleshooting or customization.)

highlight.features

String vector of genes/metadata whose names you would like to show. Only these genes/metadata will be named in the resulting heatmap.

show_colnames, show_rownames, scale, annotation_col, annotation_colors

arguments passed to pheatmap that are over-ruled by certain dittoHeatmap functionality:

  • show_colnames (& labels_col): if cell.names.meta is provided, pheatmaps's labels_col is utilized to show these names and show_colnames parameter is set to TRUE.

  • show_rownames (& labels_row): if feature names are provided to highlight.features, pheatmap's labels_row is utilized to show just these features' names and show_rownames parameter is set to TRUE.

  • scale: when parameter scaled.to.max is set to true, pheatmap's scale is set to "none" and the max scaling is performed prior to the pheatmap call.

  • annotation_col: Can be provided as normal by the user and any metadata given to annot.by will then be appended.

  • annotation_colors: dittoHeatmap fills this complicated-to-produce input in automatically by pulling from the colors given to annot.colors, but it is possible to set all or some manually. dittoSeq will just fill any left out annotations. Format is a named (annotation_col & annotation_row colnames) character vector list where individual color values can also be named.

cluster_cols, border_color, legend_breaks, breaks, drop_levels, ...

other arguments passed to pheatmap directly (or to pheatmap if complex = TRUE).

complex

Logical which sets whether the heatmap should be generated with ComplexHeatmap (TRUE) versus pheatmap (FALSE, default).

Details

This function serves as a wrapper for creating heatmaps from bulk or single-cell RNAseq data with pheatmap::pheatmap, by essentially automating the data extraction and annotation building steps. (Or alternatively with ComplexHeatmap::pheatmap if complex is set to true.

The function will extract the expression matrix for a set of genes and/or an optional subset of cells / samples to use via cells.use, This matrix is either left as is, default (for scaling within the ultimate call to pheatmap), or if scaled.to.max = TRUE, is scaled by dividing each row by its maximum value.

When provided with a set of metadata slot names to use for building annotations (with the annot.by input), the relevant metadata is retrieved from the object and compiled into a pheatmap-ready annotation_col input. The input annot.colors is used to establish the set of colors that should be used for building a pheatmap-ready annotation_colors input as well, unless such an input has been provided by the user. See below for further details.

Value

A pheatmap object.

Alternatively, if complex is set to TRUE, a Heatmap

Alternatively, if data.out is set to TRUE, a list containing all arguments that would have be passed to pheatmap to generate such a heatmap.

Many additional characteristics of the plot can be adjusted using discrete inputs

  • The cells can be ordered in a set way using the order.by input.

    Such ordering happens by default for single-cell RNAseq data when any metadata are provided to annot.by as it is often unfeasible to cluster thousands of cells.

  • A plot title can be added with main.

  • Gene or cell/sample names can be hidden with show_rownames and show_colnames, respectively, or...

    • Particular features can also be selected for labeling using the highlight.features input.

    • Names of all cells/samples can be replaced with the contents of a metadata slot using the cell.names.meta input.

  • Additional tweaks are possible through use of pheatmap inputs which will be directly passed through. Some examples of useful pheatmap parameters are:

    • cluster_cols and cluster_rows for controlling clustering. Note: cluster_cols will always be over-written to be FALSE when the input order.by is used above.

    • treeheight_row and treeheight_col for setting how large the trees on the side/top should be drawn.

    • cutree_col and cutree_row for spliting the heatmap based on kmeans clustering

  • When complex is set to TRUE, additional inputs for the Heatmap function can be given as well. Some examples:

    • use_raster to have the heatmap rasterized/flattened to pixels which can make working with large heatmaps in a figure editor, like Illustrator, simpler.

    • name to give the heatmap color scale a custom title.

Customized annotations

In typical operation, dittoHeatmap pulls metadata annotations given to annot.by to build a pheatmap-annotation_col input, then it uses the colors provided to annot.colors to create the pheatmap-annotation_colors input which sets the annotation coloring. Specifically...

  • colors for the values of discrete metadata are pulled from the start of the annot.colors vector, in the order that they are given to annot.by

  • colors for the values of continuous metadata are pulled from the end of the annot.colors vector, in the order that they are given to annot.by

To customize colors or add additional column or row annotations, users can also provide annotation_colors, annotation_col, or annotation_row pheatmap-inputs directly. General structure is described below, but see pheatmap for additional details and examples.

  • annotation_col = a data.frame with rownames of the barcodes/names of all cells/samples in the dataset & columns representing annotations. Names of columns are used as the annotation titles. *dittoSeq will append any annot.by annotations to this dataframe.

  • annotation_row = a data.frame with rownames of the genes/feature of the dataset & columns representing annotations. Names of columns are used as the annotation titles.

  • annotation_colors = a named list of string (color) vectors. Vectors must be named by the row or column annotation title that they are associated with. Optionally, individual colors can be named with the values that they should be associated with.

    Partial annotation_colors lists (containing vectors for only certain annotations) will have colors for left out annotations filled in automatically. For such filling, annot.colors are pulled for column annotations first, then for row annotations.

Author(s)

Daniel Bunis and Jared Andrews

See Also

pheatmap::pheatmap, for how to add additional heatmap tweaks, OR or ComplexHeatmap::pheatmap and Heatmap for when you want to turn on rasterization or any additional customizations offered by this fantastic package.

metaLevels for helping to create manual annotation_colors inputs. This function universally checks the options/levels of a string, factor (filled only by default), or numerical metadata.

Examples

example(importDittoBulk, echo = FALSE)
scRNA <- setBulk(myRNA, FALSE)

# We now have two SCEs for our example purposes:
  # 'myRNA' will be treated as a bulk RNAseq dataset
  # 'scRNA' will be treated as a single-cell RNAseq dataset

# Pick a set of genes
genes <- getGenes(myRNA)[1:30]

# Make a heatmap with cells/samples annotated by their clusters
dittoHeatmap(myRNA, genes,
    annot.by = "clustering")

# For single-cell data, you will typically have more cells than can be
# clustered quickly. Thus, cell clustering is turned off by default for
# single-cell data.
dittoHeatmap(scRNA, genes,
    annot.by = "clustering")

# Using the 'order.by' input:
#   Ordering by a useful metadata or gene is often helpful.
#   For single-cell data, order.by defaults to the first element given to
#     annot.by.
#   For bulk data, order.by must be set separately.
dittoHeatmap(myRNA, genes,
    annot.by = "clustering",
    order.by = "clustering",
    cluster_cols = FALSE)
# 'order.by' can be multiple metadata/genes, or a vector of indexes directly 
dittoHeatmap(scRNA, genes,
    annot.by = "clustering",
    order.by = c("clustering", "timepoint"))
dittoHeatmap(scRNA, genes,
    annot.by = "clustering",
    order.by = ncol(scRNA):1)

# When there are many cells, showing names becomes less useful.
#   Names can be turned off with the 'show_colnames' parameter.
dittoHeatmap(scRNA, genes,
    annot.by = "groups",
    show_colnames = FALSE)

# When theree are many many cells & genes, rasterization can be super useful
# as well.
#   Rasterization, or flattening of the distinct color objects to a matrix of
#   pixels, is the default for large heatmaps in the ComplexHeatmap package,
#   and you can have the heatmap rendered with this package (rather than the
#   pheatmap package) by setting 'complex = TRUE'.
#   Our data here is too small to hit that defaulting switch, so lets give
#   the direct input, 'use_raster' as well:
if (requireNamespace("ComplexHeatmap")) { # Checks if you have the package.
    dittoHeatmap(scRNA, genes, annot.by = "groups", show_colnames = FALSE,
        complex = TRUE,
        use_raster = TRUE)
}

# Additionally, it is recommended for single-cell data that the parameter
#   scaled.to.max be set to TRUE, or scale be "none" and turned off altogether,
#   because these data are generally enriched for zeros that otherwise get
#   scaled to a negative value.
dittoHeatmap(myRNA, genes, annot.by = "groups",
    order.by = "groups", show_colnames = FALSE,
    scaled.to.max = TRUE)



dtm2451/dittoSeq documentation built on May 5, 2024, 11:19 a.m.