knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This document introduces the TaxNorm
R package, a package for normalizing microbiome taxa data. Here, we will go through how to install, analyze and visualize microbiome data using this package. TaxNorm
implements the Zero Inflated Negative Binomial (ZINB) method to normalize microbiome data.
There are three main steps in using this package:
Load and QC Input Data: In the package we have an example data set from the phyloeq package that shows shows the format needed for analysis. These data can be generated using methods blah blah blah.
Running ZINB Normalization Function: The TaxNorm_Normalization
function is runn using the above data on the input. This function implements the ZINB method for normalization.
Visualizing and Quality Control: Last, visualization and quality control measures are built into the package for use.
TaxaNorm
requires the packages phyloeq
and microbiome
which can be found on bioconductor.
For the newest, but potentially unstable, version of the package, direct github installation is also supported.
remotes::install_github("wangziyue57/TaxNorm")
library(TaxaNorm) # library(phyloseq) # library(microbiome) # library(ggplot2) # library(vegan) # library(MASS)
Basic Useage
data("TaxaNorm_Example_Input", package = "TaxaNorm") # run normalization TaxaNorm_Example_Output <- TaxaNorm_Normalization(data= TaxaNorm_Example_Input, depth = NULL, group = sample_data(TaxaNorm_Example_Input)$body_site, meta.data = NULL, filter.cell.num = 10, filter.taxa.count = 0, random = FALSE, ncores = 1) # run diagnosis test Diagnose_Data <- TaxaNorm_Run_Diagnose(Normalized_Results = TaxaNorm_Example_Output, prev = TRUE, equiv = TRUE, group = sample_data(TaxaNorm_Example_Input)$body_site)
Built in example data as a phyloseq object can be loaded with the command below.
data("TaxaNorm_Example_Input", package = "TaxaNorm")
We have prepared several QC figures for the input data characters, which give a preliminary criteria on pre-filtering rare taxa with low information before any analysis. This will improve the power and computational efficiency for the algorithm. If the user already has the cleaned data or pre-processed the data by themselves before, they can ignore and skip this step.
qc_data <- TaxaNorm_QC_Input(TaxaNorm_Example_Input)
Here we provide a popular option to ensure at least filter.sample.num
samples with a count of filter.taxa.count
or more, where filter.sample.num
can be chosen as any arbitrary value or the sample size of the smallest group of samples. By default, we used filter.taxa.count=1
and filter.sample.num=10
. This criteria is incorporated in the following main function TaxNorm_Normalization()
as well.
filter.sample.num =1 filter.taxa.count = 10 taxaIn <- rowSums(abundances(TaxaNorm_Example_Input) > filter.taxa.count) > filter.sample.num TaxaNorm_Example_Input <- prune_taxa(taxaIn, TaxaNorm_Example_Input)
Users can apply any of their customized filtering criteria as well. Alternatively, a basic pre-filtering is to keep only rows that have at least 10 reads total:
taxaIn <- rowSums(abundances(TaxaNorm_Example_Input)) > 10 TaxaNorm_Example_Input <- prune_taxa(taxaIn, TaxaNorm_Example_Input)
qc_data <- TaxNorm_QC_Input(TaxaNorm_Example_Input)
The normalization is run and returns a TaxaNorm_Results
object. This object contains the input data, raw data, normdata, ecdf, model parameters, and convergence.
#Pick group from phyloseq object group <- sample_data(TaxaNorm_Example_Input)$body_site #Run Normalization function Normalized_Data <- TaxaNorm_Normalization(data = TaxaNorm_Example_Input, depth = NULL, group = group, filter.taxa.count = 0, random = TRUE, ncores = 1)
data("TaxaNorm_Example_Output", package = "TaxaNorm") TaxaNorm_Model_QC(TaxaNormResults = TaxaNorm_Example_Output)
TaxaNorm_NMDS(TaxaNormResults = TaxaNorm_Example_Output, group_column = "body_site")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.