Macarron is a workflow to systematically annotate and prioritize potentially bioactive (and often unannotated) small molecules in microbial community metabolomic datasets. Macarron prioritizes metabolic features as potentially bioactive in a phenotype/condition of interest using a combination of (a) covariance with annotated metabolites, (b) ecological properties such as abundance with respect to covarying annotated compounds, and (c) differential abundance in the phenotype/condition of interest.
If you have questions, please direct it to: Macarron Forum
Macarron requires R version 4.2.0 or higher. Install Bioconductor and then install Macarron:
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Macarron")
Macarron can be run from the command line or as an R function. Both methods require the same
arguments, have the same options, and use the same default settings. The package includes the
wrapper Macarron()
as well as functions which perform different steps in the Macarron
framework.
Macarron requires 4 comma-separated, appropriately formatted input files. The files and their formatting constraints are described below.
If you do not have the chemical taxonomy file, you can generate this file using the annotation dataframe and Macarron utility decorate_ID
(see Advanced Topics).
By default, all files will be stored in a folder named Macarron_output inside the current working directory. The main prioritization results are stored in prioritized_metabolites_all.csv
. Another file, prioritized_metabolites_characterizable.csv
is a subset of prioritized_metabolites_all.csv
and only contains metabolic features which covary with at least one annotated metabolite.
The columns in these output files are:
Maaslin2
.Example (demo) input files can be found under inst/extdata
folder of the Macarron
source. These files were generated from the PRISM study of stool metabolomes of individuals with inflammatory bowel disease (IBD) and healthy "Control" individuals. Control and IBD are the two phenotypes in this example. Macarron will be applied to prioritize metabolic features with respect to their bioactivity in IBD. Therefore, in this example, the phenotype of interest is "IBD" and the reference phenotype is "Control". The four input files are demo_abundances.csv
, demo_annotations.csv
, demo_metadata.csv
, and demo_taxonomy.csv
.
library(Macarron) prism_abundances <- system.file( 'extdata','demo_abundances.csv', package="Macarron") prism_annotations <-system.file( 'extdata','demo_annotations.csv', package="Macarron") prism_metadata <-system.file( 'extdata','demo_metadata.csv', package="Macarron") mets_taxonomy <-system.file( 'extdata','demo_taxonomy.csv', package="Macarron") prism_prioritized <- Macarron::Macarron(input_abundances = prism_abundances, input_annotations = prism_annotations, input_metadata = prism_metadata, input_taxonomy = mets_taxonomy)
abundances_df = read.csv(file = prism_abundances, row.names = 1) # setting features as rownames annotations_df = read.csv(file = prism_annotations, row.names = 1) # setting features as rownames metadata_df = read.csv(file = prism_metadata, row.names = 1) # setting samples as rownames taxonomy_df = read.csv(file = mets_taxonomy) # Running Macarron prism_prioritized <- Macarron::Macarron(input_abundances = abundances_df, input_annotations = annotations_df, input_metadata = metadata_df, input_taxonomy = taxonomy_df)
The Macarron::Macarron()
function is a wrapper for the Macarron framework. Users can also apply individual functions on the input dataframes to achieve same results as the wrapper with the added benefit of storing output from each function for other analyses. There are seven steps:
# Step 1: Storing input data in a summarized experiment object prism_mbx <- prepInput(input_abundances = abundances_df, input_annotations = annotations_df, input_metadata = metadata_df) # Step 2: Creating a distance matrix from pairwise correlations in abundances of metabolic features prism_w <- makeDisMat(se = prism_mbx) # Step 3: Finding covariance modules prism_modules <- findMacMod(se = prism_mbx, w = prism_w, input_taxonomy = taxonomy_df) # The output is a list containing two dataframes- module assignments and measures of success # if evaluateMOS=TRUE. To write modules to a separate dataframe, do: prism_module_assignments <- prism_modules[[1]] prism_modules_mos <- prism_modules[[2]] # Step 4: Calculating AVA prism_ava <- calAVA(se = prism_mbx, mod.assn = prism_modules) # Step 5: Calculating q-value prism_qval <- calQval(se = prism_mbx, mod.assn = prism_modules) # Step 6: Calculating effect size prism_es <- calES(se = prism_mbx, mac.qval = prism_qval) # Step 7: Prioritizing metabolic features prism_prioritized <- prioritize(se = prism_mbx, mod.assn = prism_modules, mac.ava = prism_ava, mac.qval = prism_qval, mac.es = prism_es) # The output is a list containing two dataframes- all prioritized metabolic features and # only characterizable metabolic features. all_prioritized <- prism_prioritized[[1]] char_prioritized <- prism_prioritized[[2]] # Step 8 (optional): View only the highly prioritized metabolic features in each module prism_highly_prioritized <- showBest(prism_prioritized)
Session info from running the demo in R can be displayed with the following command.
sessionInfo()
The input taxonomy dataframe can be generated using the input metabolic features annotation dataframe using Macarron::decorateID()
. This function annotates an HMDB ID or a PubChem CID with the chemical class and subclass of the metabolite.
taxonomy_df <- decorateID(input_annotations = annotations_df) write.csv(taxonomy_df, file="demo_taxonomy.csv", row.names = FALSE)
A record of all chosen parameters and steps that were followed during execution.
This file provides information about the properties of covariance modules used in the analysis. By default, modules are generated using a minimum module size (MMS) (argument: min_module_size
) equal to cube root of the total number of prevalent metabolic features. Macarron evaluates 9 measures of success (MOS) that collectively capture the "correctness" and chemical homogeneity of the modules. The MOS are as follows:
This folder contains the Maaslin2 log file (maaslin2.log), significant associations found by Maaslin2 (significant_results.tsv) and the linear model residuals file (residuals.rds). For more information, see Maaslin2.
Ideally, at least 50% metabolic features must be retained after prevalence filtering. By default, Macarron uses the union of metabolic features observed (non-zero abundance) in at least 70% samples of any phenotype for further analysis. This prevalence threshold may be high for some metabolomics datasets and can be changed using the min_prevalence
argument.
prism_prioritized <- Macarron::Macarron(input_abundances = abundances_df, input_annotations = annotations_df, input_metadata = metadata_df, input_taxonomy = taxonomy_df, min_prevalence = 0.5) # or prism_w <- makeDisMat(se = prism_mbx, min_prevalence = 0.5)
By default, cube root of the total number of prevalent features is used as the minimum module size (MMS) (argument: min_module_size
) for module detection and generation. We expect this to work for most real world datasets. To determine if the modules are optimal for further analysis, Macarron evaluates several measures of success (MOS) as described above. In addition to evaluating MOS for modules generated using the default MMS, Macarron also evaluates MOS for MMS values that are larger (MMS+5, MMS+10) and smaller (MMS-5, MMS-10) than the default MMS. If you find that the MOS improve with larger or smaller MMS, you may change the default accordingly. For more details about module detection, please see WGCNA
and dynamicTreeCut
.
# See MOS of modules generated using default prism_modules <- findMacMod(se = prism_mbx, w = prism_w, input_taxonomy = taxonomy_df) prism_modules_mos <- prism_modules[[2]] View(prism_modules_mos) # Change MMS prism_modules <- findMacMod(se = prism_mbx, w = prism_w, input_taxonomy = taxonomy_df, min_module_size = 10)
Macarron uses Maaslin2 for determining the q-value of differential abundance in a phenotype of interest. For default execution, the phenotype of interest must be a category in column 1 of the metadata dataframe e.g. IBD in diagnosis in the demo. This is also the column that is picked by the metadata_variable
argument for identifying the main phenotypes/conditions in any dataset (see Macarron.log file). Further, in the default execution, all columns in the metadata table are considered as fixed effects and the alphabetically first categorical variable in each covariate with two categories is considered as the reference. Maaslin2 requires reference categories to be explicitly defined for all categorical metadata with more than two categories.
Defaults can be changed with the arguments fixed_effects
, random_effects
and reference
. In the demo example, fixed effects
and reference
can be defined as follows:
prism_qval <- calQval(se = prism_mbx, mod.assn = prism_modules, metadata_variable = "diagnosis", fixed_effects = c("diagnosis","age","antibiotics"), reference = c("diagnosis,Control";"antibiotics,No"))
The package source contains a script MacarronCMD.R
in inst/scripts
to invoke Macarron in the command line using Rscript.
The inst/scripts
folder also contains a README file that comprehensively documents the usage of the script.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.