waddR
is an R package that provides statistical tests based on the 2-Wasserstein distance for detecting and characterizing differences between two distributions given in the form of samples. Functions for calculating the 2-Wasserstein distance and testing for differential distributions are provided, as well as specifically tailored test for differential expression in single-cell RNA sequencing data.
The package provides tools to address the following tasks: 1. Computation of the 2-Wasserstein distance 2. Two-sample tests to check for differences between two distributions 3. Detection of differential gene expression distributions in single-cell RNA sequencing data
Available on Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("waddR")
The latest package version can be installed from Github using BiocManager
:
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install("goncalves-lab/waddR")
Tests can be run by calling test()
from the devtools
package.
All tests are implemented using the testthat
package and reside in tests/testhat
waddR
The 2-Wasserstein distance is a metric to describe the distance between two distributions, representing two diferent conditions A and B. This package specifically considers the squared 2-Wasserstein distance d := W^2 which offers a decomposition into location, size, and shape terms.
The package waddR
offers three functions to calculate the 2-Wasserstein
distance, all of which are implemented in Cpp and exported to R with Rcpp for
better performance.
The function wasserstein_metric
is a Cpp reimplementation of the
function wasserstein1d
from the package transport
and offers the most exact
results.
The functions squared_wass_approx
and squared_wass_decomp
compute
approximations of the squared 2-Wasserstein distance with squared_wass_decomp
also returning the decomosition terms for location, size, and shape.
See ?wasserstein_metric
, ?squared_wass_aprox
, and ?squared_wass_decomp
.
This package provides two testing procedures using the 2-Wasserstein distance to test whether two distributions F_A and F_B given in the form of samples are different ba specifically testing the null hypothesis H0: F_A = F_B against the alternative hypothesis H1: F_A != F_B.
The first, semi-parametric (SP), procedure uses a test based on permutations combined with a generalized pareto distribution approximation to estimate small pvalues accurately.
The second procedure (ASY) uses a test based on asymptotic theory which is valid only if the samples can be assumed to come from continuous distributions.
See the documentation of the function \code{?wasserstein.test} for more details.
The waddR package provides an adaptation of the semi-parametric testing procedure based on the 2-Wasserstein distance which is specifically tailored to identify differential distributions in single-cell RNA-seqencing (scRNA-seq) data. In particular, a two-stage (TS) approach has been implemented that takes account of the specific nature of scRNA-seq data by separately testing for differential proportions of zero gene expression (using a logistic regression model) and differences in non-zero gene expression (using the semi-parametric 2-Wasserstein distance-based test) between two conditions.
See the documentation of the Single Cell testing function ?wasserstein.sc
and the test for zero expression levels ?testZeroes
for more details.
We have included detailed examples of how to use all functions provided with
waddR
in our vignettes.
They are available online here
(update this link once it is final) or from an R session with the
following command:
browseVignettes("waddR")
Schefzik, R., Flesch, J., and Goncalves, A. (2019). waddR: Using the 2-Wasserstein distance to identify differences between distributions in two-sample testing, with application to single-cell RNA-sequencing data.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.