Here we just have faceting and stratification by other colData
variables.
object <- HermesData(summarized_experiment) assay_name = "counts" x_var = "SEX" y_var = genes(object)[19] facet_var = "RACE" color_var = "AGE18" assay_matrix <- assay(object, assay_name) col_data <- colData(object) df <- data.frame( x = col_data[, x_var], y = assay_matrix[y_var, ] ) df$color <- col_data[[color_var]] df$facet <- col_data[[facet_var]] ggplot(df, aes(x = .data$x, y = .data$y)) + geom_boxplot() + stat_boxplot(geom = "errorbar") + labs(x = x_var, y = y_var) + geom_point(aes(color = .data$color)) + labs(color = color_var) + facet_wrap(~facet)
We could do a for loop for y_var
and keep x_var
and facet_var
to allow the user to be able to see distribution across different genes specified in the y_var
by x_var
and facet_var
.
y_vars = genes(object)[1:2] facet_vars = genes(object)[1:2] for (thisyvar in y_vars) { df <- data.frame( x = col_data[, x_var], y = assay_matrix[thisyvar, ] ) df$color <- col_data[[color_var]] df$facet <- col_data[[facet_var]] print(ggplot(df, aes(x = .data$x, y = .data$y)) + geom_boxplot() + stat_boxplot(geom = "errorbar") + labs(x = x_var, y = thisyvar) + geom_point(aes(color = .data$color)) + labs(color = color_var) + facet_wrap(~facet) ) } # Test that this approach works when only one gene is specified for (thisyvar in y_var) { df <- data.frame( x = col_data[, x_var], y = assay_matrix[thisyvar, ] ) df$color <- col_data[[color_var]] df$facet <- col_data[[facet_var]] print(ggplot(df, aes(x = .data$x, y = .data$y)) + geom_boxplot() + stat_boxplot(geom = "errorbar") + labs(x = x_var, y = thisyvar) + geom_point(aes(color = .data$color)) + labs(color = color_var) + facet_wrap(~facet) ) }
However, the problem here is that it will be difficult to compare within a facet the different gene distributions as it is too far away visually. Also for multiple genes there will be too many plots on the page.
fill
aesthetic for the genesThe idea here is that we can add an additional aesthetic on top of the existing ones.
Here the fill
aesthetic is a natural choice as it allows to plot the gene specific boxes next to each other.
In order to achieve this we need to use a long format representation of the different gene expression values.
object <- HermesData(summarized_experiment) assay_name <- "counts" x_var <- "SEX" y_var <- genes(object)[19] y_vars <- genes(object)[1:2] facet_var <- "RACE" color_var <- "AGE18" assay_matrix <- assay(object, assay_name) col_data <- colData(object) df2 <- data.frame( x = col_data[, x_var], color = col_data[, color_var], facet = col_data[, facet_var], y = as.numeric(t(assay_matrix[y_vars, , drop = FALSE])), fill = factor(rep(y_vars, each = ncol(assay_matrix))) )
Note that in the data.frame()
call above, the three colData
variables are repeated for each gene (as specified in y_vars
). Therefore we need to transpose the subset of the assay matrix such that each gene has a column and the rows represent the samples. When we then coerce to a numeric vector the columns are bound together, we have all expression values for the first gene, then for the second gene, etc. This then matches the colData
repetitions correctly.
Given the long format it is almost as simple as before to plot the boxplots, as we just add the additional aesthetic.
ggplot(df2, aes(x = .data$x, y = .data$y, fill = .data$fill)) + geom_boxplot() + stat_boxplot(geom = "errorbar") + geom_point( position = position_jitterdodge(jitter.width = 0), aes(color = .data$color, group = .data$fill) ) + labs(x = x_var, y = assay_name, color = color_var, fill = "Gene") + facet_wrap(~ facet)
We just need to make sure the points position uses the position_jitterdodge()
function, see https://ggplot2.tidyverse.org/reference/position_jitterdodge.html.
Depending on whether we want to jitter the points or not we can set jitter_width
to 0 or leave at default (which uses jittering).
For the y-axis label we can show the assay name, since we show the gene ID(s) now in the legend of the plot.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.