SIAMCAT is a pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes. A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots). SIAMCAT is available in three different flavors: + Galaxy web server + command line tool + R package
Please see the Support Section if you run into problems when using SIAMCAT.
The input data should be organized in the same way for every version of SIAMCAT. All files are in tab-separated column format
Label data: First row is expected to be #BINARY:1=[label for cases];-1=[label for controls]
Second row should contain the sample identifiers as tab-separated list (consistent with feature and metadata).
Third row is expected to contain the actual class labels (tab-separated), e.g. 1
for each case and -1
for each
control.
Note: Labels can take other numeric values (but not characters or strings); importantly, the label for cases has to
be greater than the one for controls.
Feature matrix: features (in rows) x samples (in columns) First row should contain sample labels (consistent with label data), while the first column should contain feature labels (e.g. taxonomic identifiers). The remaining entries are expected to be real values >= 0 that quantify the abundance of each feature in each sample.
Metadata (optional): samples (in rows) x metadata (in columns) Metadata needs to be converted to numerical values by the user (This is necessary for heatmap displays)!
The Galaxy interface can be found here: http://siamcat.embl.de/
TOOLS
lists available analysis modules.
Click to choose which ones you'd like to run.HISTORY
keeps track of every analysis step you have perfomed. you can delete analysis steps from your history using the "x" icon
Central panel: ANALYZE DATA
allows to specify input data sets and parameters for each analysis module
Additional info: https://usegalaxy.org/ (in particular the Help menu) and https://wiki.galaxyproject.org/Learn
Start by uploading your data (see above for input data formats) using the DATA IMPORT / Import Data module / Upload File
Then procede by executing all SIAMCAT modules in order (from A to I). See example history / Workflow as well as each module's description for specific information on input and output data
The commandline version are a collection of modules implemented in R which are called via a bash script.
Stable version: https://github.com/gezel/siamcat/
Developmental version (only available inside the EMBL intranet): beta:/g/bork4/zeller/dev/siamcat
# type
git clone beta:/g/bork4/zeller/dev/siamcat
# in the folder in which you'd like to clone the siamcat repository
R packages required to run SIAMCAT:
install.packages('optparse')
install.packages('LiblineaR')
install.packages('pROC')
install.packages('colorRamps')
install.packages('RColorBrewer')
install.packages('beanplot')
...COMING SOON...
The SIAMCAT R package ...COMING SOON...
...COMING SOON...
Google user group for support:
https://groups.google.com/d/forum/siamcat-users
Examples are weighted differently between classes (a remnant of our colorectal cancer microbiome study). Fixed in Galaxy, will be pushed to GitHub soon.
Class labels are somehow swapped in the LASSO module, so that prediction scores are 1 - p instead of p (posterior probability), consequently precision-recall curves are incorrect, but ROC-curves are unaffected. Appears to only occur in a recent version of R and/or the LiblineaR package; will be fixed with high priority.
Please let me know if you run into any issues (mailto: zeller@embl.de)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.