# Chunk opts knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, fig.width = 7, fig.height = 4 )
This vignette provides detailed examples for plotting V(D)J data imported into a single-cell object using djvdj. For the examples shown below, we use data for splenocytes from BL6 and MD4 mice collected using the 10X Genomics scRNA-seq platform. MD4 B cells are monoclonal and specifically bind hen egg lysozyme.
library(djvdj) library(Seurat) library(ggplot2) # Load GEX data data_dir <- system.file("extdata/splen", package = "djvdj") gex_dirs <- c( BL6 = file.path(data_dir, "BL6_GEX/filtered_feature_bc_matrix"), MD4 = file.path(data_dir, "MD4_GEX/filtered_feature_bc_matrix") ) so <- gex_dirs |> Read10X() |> CreateSeuratObject() |> AddMetaData(splen_meta) # Add V(D)J data to object vdj_dirs <- c( BL6 = system.file("extdata/splen/BL6_BCR", package = "djvdj"), MD4 = system.file("extdata/splen/MD4_BCR", package = "djvdj") ) so <- so |> import_vdj( vdj_dirs, define_clonotypes = "cdr3_gene", include_mutations = TRUE )
The plot_histogram()
function can be used to plot numerical V(D)J data present in the object. By default, this will plot the values present in the data_col
column for every cell. If data_col
contains per-chain data (i.e. the column contains the character specified by sep
, default is ';'), the values will be summarized for each cell. By default the mean will be calculated. Cells that lack V(D)J data are removed before plotting, so for this example we have r nrow(na.omit(so@meta.data))
cells with V(D)J data. The trans
argument can be used to specify an x-axis transformation.
In the example below, the mean number of UMIs per cell is plotted.
so |> plot_histogram( data_col = "umis", trans = "log10" )
To color cells based on an additional variable present in the object, specify the column name to the cluster_col
argument.
so |> plot_histogram( data_col = "umis", cluster_col = "orig.ident", trans = "log10" )
The function used to summarize per-chain values can be changed using the summary_fn
argument. The per-chain values can also be plotted separately using the per_chain
argument.
In the example below, the value for each chain is plotted.
so |> plot_histogram( data_col = "umis", cluster_col = "orig.ident", per_chain = TRUE, trans = "log10" )
To only plot values for a specific chain, the chain
argument can be used. In this example the mean number of UMIs is shown for each IGK chain.
so |> plot_histogram( data_col = "umis", cluster_col = "orig.ident", chain = "IGK", trans = "log10" )
The plot_violin()
function has similar functionality as plot_histogram()
but will generate violin plots or boxplots. For all djvdj plotting functions additional arguments can be passed directly to ggplot2 to further modify plot aesthetics. By default clusters are arranged highest to lowest based on the values in data_col
. This ordering can be modified using the plot_lvls
argument"
vln <- so |> plot_violin( data_col = "nCount_RNA", cluster_col = "sample", trans = "log10", draw_quantiles = c("0.25", "0.75") # parameter for ggplot2::geom_violin() ) bx <- so |> plot_violin( data_col = "nFeature_RNA", cluster_col = "sample", method = "boxplot" ) vln + bx
To plot V(D)J information on UMAP projections, the plot_scatter()
function can be used. By default this function will plot the 'UMAP_1' and 'UMAP_2' columns on the x- and y-axis. A function to use for summarizing per-chain values for each cell must be specified using the summary_fn
argument, by default this is mean()
for numeric data. Cells that lack V(D)J data are stored as NA
s in the object and will be shown on the plot as light grey. The color used for plotting NA
s can be modified with the na_color
argument.
In the example below, the mean number of mismatch mutations for each cell is shown.
so |> plot_scatter( data_col = "all_mis", na_color = "lightgrey" )
Instead of summarizing the per-chain values for each cell, we can also specify a chain to use for plotting. In this example we are plotting the CDR3 length for IGK chains. If a cell does not have an IGK chain or has multiple IGK chains, it will be plotted as an NA
. Like other plotting functions, values can be transformed using the trans
argument.
so |> plot_scatter( data_col = "reads", chain = "IGK", trans = "log10" )
Cell clusters can be outlined by setting the outline
argument, this can help visualize points that are lightly colored.
u1 <- so |> plot_scatter( data_col = "reads", plot_colors = c("white", "lightblue", "red"), trans = "log10", size = 2, outline = TRUE ) u2 <- so |> plot_scatter( data_col = "orig.ident", plot_colors = c("orange", "lightyellow"), size = 2, outline = TRUE ) u1 + u2
To only plot the top clusters, the number of clusters to include can be specified with the top
argument. For plot_violin()
, clusters are ranked based on values in the data_col
column. For plot_scatter()
, clusters are ranked based on number of cells. Clusters can also be specified by passing a vector of cluster names. The remaining cells are labeled using the other_label
argument.
# Show the top 3 clusters with highest values for nFeature_RNA bx <- so |> plot_violin( data_col = "nFeature_RNA", cluster_col = "seurat_clusters", method = "boxplot", top = 3 ) # Select top clusters by passing a vector of names u <- so |> plot_scatter( data_col = "seurat_clusters", top = c("4", "1", "3") ) bx + u
Plots can be split into separate panels based on an additional grouping variable using the group_col
argument. The arrangement of plot panels and the axis scales used for each panel can be adjusted using the panel_nrow
and panel_scales
arguments.
so |> plot_histogram( data_col = "reads", cluster_col = "orig.ident", group_col = "seurat_clusters", trans = "log10", panel_nrow = 2, panel_scales = "free_x" )
Plot colors can be specified using the plot_colors
argument. This should be a vector of colors, to specify colors by cluster name, a named vector can be passed. By default clusters are ordered with the most abundant clusters on top. To modify this ordering, a character vector can be passed to the plot_lvls
argument.
so |> plot_scatter( data_col = "orig.ident", plot_colors = c(MD4 = "red", BL6 = "lightblue") )
To only modify the color of a few clusters, a vector only containing the clusters of interest can be passed. The same is true for the plot_lvls
argument, the name of a single cluster can be passed to plot it on top.
In the example below we keep all the default colors except for clusters 3 and 4 and we specifically plot these clusters on top.
so |> plot_scatter( data_col = "seurat_clusters", plot_colors = c("4" = "darkred", "3" = "red"), plot_lvls = c("3", "4") )
By default the number of data points plotted will be shown in the top right corner, plot legend, and/or x-axis. The location of the label can be specified using the n_label
argument. Label appearance can be modified by passing a named list of aesthetic parameters to the label_params
argument.
so |> plot_scatter( data_col = "seurat_clusters", n_label = "corner", label_params = list(color = "darkred", fontface = "bold") )
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.