Objective

Update and replace the draw_boxplot() function from rnaseqTools to use ggplot2 and use an HermesData object as input.

Idea

In order to plot the number of non zero expression genes, we need to access the counts data in the HermesData object.

#Loading the example HermesData object.
result <- hermes_data

#Accessing the counts data.
count_results <- counts(result)

Then, for each column of the count matrix, we need to count the number of no zero genes and then create a dataframe. The dataframe will contain two columns and n rows (corrisponding to the number of columns in the counts matrix) containing one the counts and the other the name of the corrisponding counts column matrix. The derived data has to be a dataframe because ggplot requires a data frame object.

To count the non zero cells it is better to use the function sum instead of count .

#Calculating the number  of non zero gene per counts.
no_na_count <- apply(count_results, 
              MARGIN =  2,
              FUN = function(x) sum(x != 0))

# this could also be simplified as 
no_na_count <- colSums(counts(result) != 0)

#Coercing into a data frame because ggplot requires a data frame object.
df <- data.frame(no_na = no_na_count, x = "Sample")

# and then plotting it 
ggplot(df) +
     geom_boxplot(aes(y = no_na, x= x)) +
     stat_boxplot(geom = "errorbar", aes(y = no_na, x= x)) +
     ggtitle("Distribution of non-zero expression genes") +
     xlab("Library") +
     ylab("Number of non-zero genes")



#Putting the final function together.
library(EnvStats)

draw_nonzero_boxplot <- function(object, 
                                 jitter = 0.2,
                                 alpha = 1/4, ...){
  no_na_count <- colSums(counts(object) != 0)

  df <- data.frame(no_na = no_na_count, x = "Sample")

  ggplot(df, aes(y = .data$no_na, x = .data$x)) +
    geom_boxplot(outlier.shape = NA) +
    stat_boxplot(geom = "errorbar") +
    geom_point(position = position_jitter(width = jitter),
               alpha = alpha) +
    stat_n_text(text.box = TRUE) +
    ggtitle("Distribution of non-zero expression genes") +
    xlab("Library") +
    ylab("Number of non-zero genes")
}

#Running the function.
draw_nonzero_boxplot(result)


insightsengineering/hermes documentation built on Dec. 15, 2024, 8:07 a.m.