animalcules
In animalcules: Interactive microbiome analysis toolkit

knitr::opts_chunk$set(comment="#", message=FALSE)
devtools::load_all(".")
library(SummarizedExperiment)

Introduction

animalcules is an R package for utilizing up-to-date data analytics, visualization methods, and machine learning models to provide users an easy-to-use interactive microbiome analysis framework. It can be used as a standalone software package or users can explore their data with the accompanying interactive R Shiny application. Traditional microbiome analysis such as alpha/beta diversity and differential abundance analysis are enhanced, while new methods like biomarker identification are introduced by animalcules. Powerful interactive and dynamic figures generated by animalcules enable users to understand their data better and discover new insights.

Installation

Install the development version of the package from Bioconductor.

if (!requireNamespace("BiocManager", quietly=TRUE))
  install.packages("BiocManager")
BiocManager::install("compbiomed/animalcules")

Or install the development version of the package from Github.

if (!requireNamespace("devtools", quietly=TRUE))
  install.packages("devtools")
devtools::install_github("compbiomed/animalcules")

Load Packages

Load the packages needed into R session.

library(animalcules)
library(SummarizedExperiment)

Run Shiny App

This command is to start the animalcules shiny app. For shiny app tutorials, please go to our website, and click the tab "Interactive Shiny Analysis".

The rest of the vignette is for the command line version of animalcules, which could also be found in our website, and click the tab "Command Line Analysis".

run_animalcules()

Load Toy Dataset

This toy dataset contains 50 simulated samples. Throughout the vignette, we will use the data structure MultiAssayExperiment (MAE) as the input to all functions.

data_dir = system.file("extdata/MAE.rds", package = "animalcules")
MAE = readRDS(data_dir)

For users who would like to use their own dataset, please follow the following steps:

Use the shiny app to upload your dataset. check shiny tutorial
Go to tab 2 "Data Summary and Filtering", and click the button "Download Animalcules File". check shiny tutorial
Load the downloaded .rds animalcules file and rename it as MAE in R.

data_dir = "PATH_TO_THE_ANIMALCULES_FILE"
MAE = readRDS(data_dir)

Summary and Categorize

One of the first steps in the data analysis process involves summarizing the data and looking for outliers or other obvious data issues that may cause issue with downstream analysis.

Summary Plot (Pie Chart or Box plot)

One type of summarization plot returns either a box plot or pie chart for continous or categorical data respectively.

p <- filter_summary_pie_box(MAE,
                            samples_discard = c("subject_2", "subject_4"),
                            filter_type = "By Metadata",
                            sample_condition = "AGE")
p

Summary Plot (Density Plot or Bar Plot)

One type of summarization plot returns either a density plot or bar plot for continous or categorical data respectively.

p <- filter_summary_bar_density(MAE,
                                samples_discard = c("subject_2", "subject_4"),
                                filter_type = "By Metadata",
                                sample_condition = "SEX")
p

Categorize

It is often necessary to bin continuous data into categories when performing analyses that require categorical input. To help ease this process, users can automatically categorize categorical data and provide custom bin breaks and labels in doing so.

microbe <- MAE[['MicrobeGenetics']]
samples <- as.data.frame(colData(microbe))
result <- filter_categorize(samples,
                            sample_condition="AGE",
                            new_label="AGE_GROUP",
                            bin_breaks=c(0,55,75,100),
                            bin_labels=c('Young','Adult',"Elderly"))
head(result$sam_table)
result$plot.unbinned
result$plot.binned

Visualization

A typical analysis involves visualization of microbe abundances across samples or groups of samples. Animalcules implements three common types of visualization plots including stacked bar plots, heat map, and box plots generated with Plotly.

Relative Abundance Stacked Bar Plot

The stacked bar plots are used to visualize the relative abundance of microbes at a given taxonomical level in each sample represented as a single bar.

p <- relabu_barplot(MAE,
                    tax_level="family",
                    order_organisms=c('Retroviridae'),
                    sort_by="organisms",
                    sample_conditions=c('SEX', 'AGE'),
                    show_legend=TRUE)
p

Relative Abundance Heatmap

The heatmap represents a sample by organisms matrix that can be visualized at different taxonomic levels.

p <- relabu_heatmap(MAE,
                   tax_level="genus",
                   sort_by="conditions",
                   sample_conditions=c("SEX", "AGE"))
p

Relative Abundance Boxplot

The boxplot visualization allows users to compare the abundance of one or more organisms between categorical attributes.

p <- relabu_boxplot(MAE,
                    tax_level="genus",
                    organisms=c("Escherichia", "Actinomyces"),
                    condition="SEX",
                    datatype="logcpm")
p

Diversity

Alpha Diversity Boxplot

Alpha diversity, which describes the richness and evenness of sample microbial community, is a vital indicator in the microbiome analysis. animalcules provides the interactive boxplot comparison of alpha diversity between selected groups of samples. Both taxonomy levels and alpha diversity metrics (Shannon, Gini Simpson, Inverse Simpson) can be changed. Users can also conduct alpha diversity statistical tests including Wilcoxon rank sum test, T test and Kruskal-Wallis test.

Plot the alpha diversity boxplot between the levels in selected condition.

alpha_div_boxplot(MAE = MAE,
                  tax_level = "genus",
                  condition = "DISEASE",
                  alpha_metric = "shannon")

Alpha Diversity Statistical Test

Conduct statistical test on the alpha diversity between the levels in selected condition.

do_alpha_div_test(MAE = MAE,
                  tax_level = "genus",
                  condition = "DISEASE",
                  alpha_metric = "shannon",
                  alpha_stat = "T-test")

Beta Diversity Heatmap

On the other hand, by defining distances between each sample, beta diversity is another key metric to look at. Users can plot the beta diversity heatmap by selecting different beta diversity dissimilarity metrics including Bray-Curtis and Jaccard. Users can also conduct beta diversity statistical testing between groups including PERMANOVA, Wilcoxon rank sum test and Kruskal-Wallis.

Plot the beta diversity heatmap with selected condition.

diversity_beta_heatmap(MAE = MAE, 
                       tax_level = 'genus', 
                       input_beta_method = "bray",
                       input_bdhm_select_conditions = 'DISEASE',
                       input_bdhm_sort_by = 'condition')

Beta Diversity Boxplot

Plot the beta diversity boxplot within and between conditions.

diversity_beta_boxplot(MAE = MAE, 
                       tax_level = 'genus', 
                       input_beta_method = "bray",
                       input_select_beta_condition = 'DISEASE')

Beta Diversity Test

Conduct statistical test on the beta diversity between the levels in selected condition.

diversity_beta_test(MAE = MAE, 
                    tax_level = 'genus',
                    input_beta_method = "bray",
                    input_select_beta_condition =  'DISEASE',
                    input_select_beta_stat_method = 'PERMANOVA',
                    input_num_permutation_permanova = 999)

Dimensionality Reduction

PCA

A wrapper for conduction 2D and 3D Principal Component Analysis.

result <- dimred_pca(MAE,
                     tax_level="genus",
                     color="AGE",
                     shape="DISEASE",
                     pcx=1,
                     pcy=2,
                     datatype="logcpm")
result$plot
head(result$table)

PCoA

A wrapper for conduction 2D and 3D Principal Coordinate Analysis.

result <- dimred_pcoa(MAE,
                      tax_level="genus",
                      color="AGE",
                      shape="DISEASE",
                      axx=1,
                      axy=2,
                      method="bray")
result$plot
head(result$table)

UMAP

A wrapper for conduction 2D and 3D Uniform Manifold Approximation and Projection.

result <- dimred_umap(MAE,
                      tax_level="genus",
                      color="AGE",
                      shape="DISEASE",
                      cx=1,
                      cy=2,
                      n_neighbors=15,
                      metric="euclidean",
                      datatype="logcpm")
result$plot

t-SNE

A wrapper for conduction 2D and 3D t-distributed stochastic neighbor embedding.

result <- dimred_tsne(MAE,
                      tax_level="phylum",
                      color="AGE",
                      shape="GROUP",
                      k="3D",
                      initial_dims=30,
                      perplexity=10,
                      datatype="logcpm")
result$plot

Differential Analysis

Here in animalcules, we provide a DESeq2-based differential abundance analysis. Users can choose the target variable, covariate variable, taxonomy level, minimum count cut-off, and an adjusted p-value threshold. The analysis report will output not only the adjusted p-value and log2-fold-change of the microbes, but also the percentage, prevalence, and the group size-adjusted fold change.

p <- differential_abundance(MAE,
                            tax_level="phylum",
                            input_da_condition=c("DISEASE"),
                            min_num_filter = 2,
                            input_da_padj_cutoff = 0.5)
p

Biomarker

One unique feature of animalcules is the biomarker identification module built on machine learning models. Users can choose one classification model from logistic regression, gradient boosting machine, or random forest to identify a microbes biomarker list. The feature importance score for each microbe will be provided. To evaluate the biomarker performance, the ROC plot and the AUC value using cross-validation outputs are shown to users. Note: Results may vary each run.

Train biomarker

p <- find_biomarker(MAE,
                    tax_level = "genus",
                    input_select_target_biomarker = c("SEX"),
                    nfolds = 3,
                    nrepeats = 3,
                    seed = 99,
                    percent_top_biomarker = 0.2,
                    model_name = "logistic regression")
# biomarker
p$biomarker

# importance plot
p$importance_plot

# ROC plot
p$roc_plot

Session Info

sessionInfo()

Any scripts or data that you put into this service are public.

animalcules documentation built on Nov. 8, 2020, 6:47 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

animalcules
Interactive microbiome analysis toolkit

animalcules
In animalcules: Interactive microbiome analysis toolkit

Introduction

Installation

Load Packages

Run Shiny App

Load Toy Dataset

Summary and Categorize

Summary Plot (Pie Chart or Box plot)

Summary Plot (Density Plot or Bar Plot)

Categorize

Visualization

Relative Abundance Stacked Bar Plot

Relative Abundance Heatmap

Relative Abundance Boxplot

Diversity

Alpha Diversity Boxplot

Alpha Diversity Statistical Test

Beta Diversity Heatmap

Beta Diversity Boxplot

Beta Diversity Test

Dimensionality Reduction

PCA

PCoA

UMAP

t-SNE

Differential Analysis

Biomarker

Train biomarker

Session Info

Try the animalcules package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

animalcules Interactive microbiome analysis toolkit

animalcules In animalcules: Interactive microbiome analysis toolkit

Introduction

Installation

Load Packages

Run Shiny App

Load Toy Dataset

Summary and Categorize

Summary Plot (Pie Chart or Box plot)

Summary Plot (Density Plot or Bar Plot)

Categorize

Visualization

Relative Abundance Stacked Bar Plot

Relative Abundance Heatmap

Relative Abundance Boxplot

Diversity

Alpha Diversity Boxplot

Alpha Diversity Statistical Test

Beta Diversity Heatmap

Beta Diversity Boxplot

Beta Diversity Test

Dimensionality Reduction

PCA

PCoA

UMAP

t-SNE

Differential Analysis

Biomarker

Train biomarker

Session Info

Try the animalcules package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

animalcules
Interactive microbiome analysis toolkit

animalcules
In animalcules: Interactive microbiome analysis toolkit