knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-" )
R package that accompanies our paper 'Comparison of transformations for single-cell RNA-seq data ' (https://www.nature.com/articles/s41592-023-01814-1).
transformGamPoi
provides methods to stabilize the variance of single cell count data:
You can install the current development version of transformGamPoi
by typing the following into the R console:
# install.packages("devtools") devtools::install_github("const-ae/transformGamPoi")
The installation should only take a few seconds and work across all major operating systems (MacOS, Linux, Windows).
Let's compare the different variance-stabilizing transformations.
We start by loading the transformGamPoi
package and setting a seed to make sure the results are reproducible.
library(transformGamPoi) set.seed(1)
We then load some example data, which we subset to 1000 genes and 500 cells
sce <- TENxPBMCData::TENxPBMCData("pbmc4k") sce_red <- sce[sample(which(rowSums2(counts(sce)) > 0), 1000), sample(ncol(sce), 500)]
We calculate the different variance-stabilizing transformations. We can either use the generic transformGamPoi()
method and specify the transformation
, or we use the low-level functions acosh_transform()
, shifted_log_transform()
, and residual_transform()
which provide more settings. All functions return a matrix, which we can for example insert back into the SingleCellExperiment
object:
assay(sce_red, "acosh") <- transformGamPoi(sce_red, transformation = "acosh") assay(sce_red, "shifted_log") <- shifted_log_transform(sce_red, overdispersion = 0.1) # For large datasets, we can also do the processing without # loading the full dataset into memory (on_disk = TRUE) assay(sce_red, "rand_quant") <- residual_transform(sce_red, "randomized_quantile", on_disk = FALSE) assay(sce_red, "pearson") <- residual_transform(sce_red, "pearson", clipping = TRUE, on_disk = FALSE)
Finally, we compare the variance of the genes after transformation using a scatter plot
par(pch = 20, cex = 1.15) mus <- rowMeans2(counts(sce_red)) plot(mus, rowVars(assay(sce_red, "acosh")), log = "x", col = "#1b9e77aa", cex = 0.6, xlab = "Log Gene Means", ylab = "Variance after transformation") points(mus, rowVars(assay(sce_red, "shifted_log")), col = "#d95f02aa", cex = 0.6) points(mus, rowVars(assay(sce_red, "pearson")), col = "#7570b3aa", cex = 0.6) points(mus, rowVars(assay(sce_red, "rand_quant")), col = "#e7298aaa", cex = 0.6) legend("topleft", legend = c("acosh", "shifted log", "Pearson Resid.", "Rand. Quantile Resid."), col = c("#1b9e77", "#d95f02", "#7570b3", "#e7298a"), pch = 16)
There are a number of preprocessing methods and packages out there. Of particular interests are
shifted_log_transform()
and plays nicely together with the Bioconductor universe. For more information, I highly recommend to take a look at the normalization section of the OSCA book.sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.