Home

/

GitHub

/

In waldronlab/lefser: R implementation of the LEfSE method for microbiome biomarker discovery

knitr::opts_chunk$set(comment = "#>", 
                      fig.align = "center",
                      collapse = FALSE, 
                      message = FALSE, 
                      warning = FALSE, 
                      eval = TRUE)

Overview

Background

lefser is the R implementation of the Linear discriminant analysis (LDA) Effect Size (LEfSe), a Python package for metagenomic biomarker discovery and explanation. (Huttenhower et al. 2011).

The original software utilizes standard statistical significance tests along with supplementary tests that incorporate biological consistency and the relevance of effects to identity the features (e.g., organisms, clades, OTU, genes, or functions) that are most likely to account for differences between the two sample classes of interest. While LEfSe is widely used and available in different platform such as Galaxy UI and Conda, there is no convenient way to incorporate it in R-based workflows. Thus, we re-implement LEfSe as an R/Bioconductor package, lefser. Following the LEfSe‘s algorithm including Kruskal-Wallis test, Wilcoxon-Rank Sum test, and Linear Discriminant Analysis, with some modifications, lefser successfully reproduces and improves the original statistical method and the associated plotting functionality.

Install and load pacakge

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("lefser")

library(lefser)

Analysis example

Prepare input

lefser package include the demo dataset, zeller14, which is the microbiome data from colorectal cancer (CRC) patients and controls (Zeller et al. 2014).

In this vignette, we excluded the 'adenoma' condition and used control/CRC as the main classes and age category as sub-classes (adult vs. senior) with different numbers of samples: control-adult (n = 46), control-senior (n = 20), CRC-adult (n = 45), and CRC-senior (n = 46).

data(zeller14)
zeller14 <- zeller14[, zeller14$study_condition != "adenoma"]

The class and subclass information is stored in the colData slot under the study_condition and age_category columns, respectively.

## Contingency table
table(zeller14$age_category, zeller14$study_condition)

If you try to run lefser directly on the 'zeller14' data, you will get the following warning messages

lefser(zeller14, classCol = "study_condition", subclassCol = "age_category")

Warning messages:
1: In lefser(zeller14, classCol = "study_condition", subclassCol = "age_category") :
  Convert counts to relative abundances with 'relativeAb()'
2: In lda.default(x, classing, ...) : variables are collinear

Terminal node

When working with taxonomic data, including both terminal and non-terminal nodes in the analysis can lead to collinearity problems. Non-terminal nodes (e.g., genus) are often linearly dependent on their corresponding terminal nodes (e.g., species) since the species-level information is essentially a subset or more specific representation of the genus-level information. This collinearity can violate the assumptions of certain statistical methods, such as linear discriminant analysis (LDA), and can lead to unstable or unreliable results. By using only terminal nodes, you can effectively eliminate this collinearity issue, ensuring that your analysis is not affected by linearly dependent or highly correlated variables. Additionally, you can benefit of avoiding redundancy, increasing specificity, simplifying data, and reducing ambiguity, using only terminal nodes.

You can select only the terminal node using get_terminal_nodes function.

tn <- get_terminal_nodes(rownames(zeller14))
zeller14tn <- zeller14[tn,]

Relative abundance

First warning message informs you that lefser requires relative abundance of features. You can use relativeAb function to reformat your input.

zeller14tn_ra <- relativeAb(zeller14tn)

Run `lefser`

The lefser function returns a data.frame with two columns - the names of selected features (the features column) and their effect size (the scores column).

There is a random number generation step in the lefser algorithm to ensure that more than half of the values for each features are unique. In most cases, inputs are sparse, so in practice, this step is handling 0s. So to reproduce the identical result, you should set the seed before running lefser.

set.seed(1234)
res <- lefser(zeller14tn_ra, # relative abundance only with terminal nodes
              classCol = "study_condition",
              subclassCol = "age_category")
head(res)

Visualization using `lefserPlot`

lefserPlot(res)

Benchmarking againt other tools

The codes for benchmarking lefser against LEfSe and the other R implementation of LEfSe is available here.

Interoperating with phyloseq

When using phyloseq objects, we recommend to extract the data and create a SummarizedExperiment object as follows:

library(phyloseq)
library(SummarizedExperiment)

## Load phyloseq object
fp <- system.file("extdata", 
                  "study_1457_split_library_seqs_and_mapping.zip",
                  package = "phyloseq")
kostic <- microbio_me_qiime(fp)

## Split data tables
counts <- unclass(otu_table(kostic))
coldata <- as(sample_data(kostic), "data.frame")

## Create a SummarizedExperiment object
SummarizedExperiment(assays = list(counts = counts), colData = coldata)

You may also consider using makeTreeSummarizedExperimentFromPhyloseq from the mia package.

mia::makeTreeSummarizedExperimentFromPhyloseq(kostic)

Session Info

sessionInfo()

waldronlab/lefser documentation built on Nov. 1, 2024, 9:02 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

waldronlab/lefser
R implementation of the LEfSE method for microbiome biomarker discovery

In waldronlab/lefser: R implementation of the LEfSE method for microbiome biomarker discovery

Overview

Background

Install and load pacakge

Analysis example

Prepare input

Terminal node

Relative abundance

Run `lefser`

Visualization using `lefserPlot`

Benchmarking againt other tools

Interoperating with phyloseq

Session Info

R Package Documentation

Browse R Packages

We want your feedback!

waldronlab/lefser R implementation of the LEfSE method for microbiome biomarker discovery

In waldronlab/lefser: R implementation of the LEfSE method for microbiome biomarker discovery

Overview

Background

Install and load pacakge

Analysis example

Prepare input

Terminal node

Relative abundance

Run lefser

Visualization using lefserPlot

Benchmarking againt other tools

Interoperating with phyloseq

Session Info

R Package Documentation

Browse R Packages

We want your feedback!

waldronlab/lefser
R implementation of the LEfSE method for microbiome biomarker discovery

Run `lefser`

Visualization using `lefserPlot`