MaAsLin2 is the next generation of MaAsLin.
MaAsLin2 is comprehensive R package for finding multivariable association between clinical metadata and microbial meta-omics features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including multiple analysis methods (with support for multiple covariates and repeated measures), filtering, normalization, and transform options to customize analysis for your specific study.
If you use the MaAsLin2 software, please cite our manuscript: Mallick et al. (2020+). "Multivariable Association in Population-scale Meta-omics Studies" (In Preparation).
If you have questions, please email the google group MaAsLin Users.
MaAsLin2 finds associations between microbiome multi-omics features and complex metadata in population-scale epidemiological studies. The software includes multiple analysis methods (with support for multiple covariates and repeated measures), filtering, normalization, and transform options to customize analysis for your specific study.
MaAsLin2 is an R package that can be run on the command line or as an R function.
MaAsLin2 can be run from the command line or as an R function. If only running from the command line, you do not need to install the MaAsLin2 package but you will need to install the MaAsLin2 dependencies.
$ tar xzvf maaslin2.tar.gz
$ R -q -e "install.packages(c('lmerTest','pbapply','car','dplyr','vegan','chemometrics','ggplot2','pheatmap','hash','logging','data.table','MuMIn','glmmTMB','MASS','cplm','pscl'), repos='http://cran.r-project.org')"
$ R CMD INSTALL maaslin2
Install Bioconductor and then install Maaslin2
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Maaslin2")
MaAsLin2 can be run from the command line or as an R function. Both methods require the same arguments, have the same options, and use the same default settings.
MaAsLin2 requires two input files.
The data file can contain samples not included in the metadata file (along with the reverse case). For both cases, those samples not included in both files will be removed from the analysis. Also the samples do not need to be in the same order in the two files.
NOTE: If running MaAsLin2 as a function, the data and metadata
inputs can be of type data.frame
instead of a path to a file.
MaAsLin2 generates two types of output files: data and visualization.
all_results.tsv
N
column is the total number of data points.N.not.zero
column is the total of non-zero data points.p.adjust
with the correction method.significant_results.tsv
residuals.rds
maaslin2.log
heatmap.pdf
[a-z/0-9]+.pdf
Example input files can be found in the inst/extdata
folder
of the MaAsLin2 source. The files provided were generated from
the HMP2 data which can be downloaded from https://ibdmdb.org/ .
HMP2_taxonomy.tsv
: is a tab-demilited file with species as columns and samples as rows. It is a subset of the taxonomy file so it just includes the species abundances for all samples.
HMP2_metadata.tsv
: is a tab-delimited file with samples as rows and metadata as columns. It is a subset of the metadata file so that it just includes some of the fields.
$ Maaslin2.R --transform=AST --fixed_effects="diagnosis,dysbiosisnonIBD,dysbiosisUC,dysbiosisCD,antibiotics,age" --random_effects="site,subject" --normalization=NONE --standardize=FALSE inst/extdata/HMP2_taxonomy.tsv inst/extdata/HMP2_metadata.tsv demo_output
HMP2_taxonomy.tsv
is the path to your data (or features) fileHMP2_metadata.tsv
is the path to your metadata filedemo_output
is the path to the folder to write the outputlibrary(Maaslin2) input_data <- system.file( 'extdata','HMP2_taxonomy.tsv', package="Maaslin2") input_metadata <-system.file( 'extdata','HMP2_metadata.tsv', package="Maaslin2") fit_data <- Maaslin2( input_data, input_metadata, 'demo_output', transform = "AST", fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'), random_effects = c('site', 'subject'), reference = "diagnosis,nonIBD", normalization = 'NONE', standardize = FALSE)
Session info from running the demo in R can be displayed with the following command.
sessionInfo()
Run MaAsLin2 help to print a list of the options and the default settings.
$ Maaslin2.R --help
Usage: ./R/Maaslin2.R [options]
Options: -h, --help Show this help message and exit
-a MIN_ABUNDANCE, --min_abundance=MIN_ABUNDANCE The minimum abundance for each feature [ Default: 0 ] -p MIN_PREVALENCE, --min_prevalence=MIN_PREVALENCE The minimum percent of samples for which a feature is detected at minimum abundance [ Default: 0.1 ] -b MIN_VARIANCE, --min_variance=MIN_VARIANCE Keep features with variance greater than [ Default: 0.0 ] -s MAX_SIGNIFICANCE, --max_significance=MAX_SIGNIFICANCE The q-value threshold for significance [ Default: 0.25 ] -n NORMALIZATION, --normalization=NORMALIZATION The normalization method to apply [ Default: TSS ] [ Choices: TSS, CLR, CSS, NONE, TMM ] -t TRANSFORM, --transform=TRANSFORM The transform to apply [ Default: LOG ] [ Choices: LOG, LOGIT, AST, NONE ] -m ANALYSIS_METHOD, --analysis_method=ANALYSIS_METHOD The analysis method to apply [ Default: LM ] [ Choices: LM, CPLM, ZICP, NEGBIN, ZINB ] -r RANDOM_EFFECTS, --random_effects=RANDOM_EFFECTS The random effects for the model, comma-delimited for multiple effects [ Default: none ] -f FIXED_EFFECTS, --fixed_effects=FIXED_EFFECTS The fixed effects for the model, comma-delimited for multiple effects [ Default: all ] -c CORRECTION, --correction=CORRECTION The correction method for computing the q-value [ Default: BH ] -z STANDARDIZE, --standardize=STANDARDIZE Apply z-score so continuous metadata are on the same scale [ Default: TRUE ] -l PLOT_HEATMAP, --plot_heatmap=PLOT_HEATMAP Generate a heatmap for the significant associations [ Default: TRUE ] -i HEATMAP_FIRST_N, --heatmap_first_n=HEATMAP_FIRST_N In heatmap, plot top N features with significant associations [ Default: TRUE ] -o PLOT_SCATTER, --plot_scatter=PLOT_SCATTER Generate scatter plots for the significant associations [ Default: TRUE ] -e CORES, --cores=CORES The number of R processes to run in parallel [ Default: 1 ] -d REFERENCE, --reference=REFERENCE The factor to use as a reference for a variable with more than two levels provided as a string of 'variable,reference' semi-colon delimited for multiple variables [ Default: NA ]
Maaslin2.R: command not found
. How do I fix this? Error in library(Maaslin2): there is no package called 'Maaslin2'
. How do I fix this? Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.