View source: R/check_associations.r
check.associations | R Documentation |
This function computes different measures of association
between features and the label and stores the results in the
association
slot of the SIAMCAT object
check.associations(siamcat, formula="feat~label", test='wilcoxon',
alpha=0.05, mult.corr="fdr", log.n0=1e-06, pr.cutoff=1e-06,
probs.fc=seq(.1, .9, .05), paired=NULL, feature.type='filtered',
verbose = 1)
siamcat |
object of class siamcat-class |
formula |
string, formula used for testing, see Details for more
information, defaults to |
test |
string, statistical test used for the association testing, can
be either |
alpha |
float, significance level, defaults to |
mult.corr |
string, multiple hypothesis correction method, see
|
log.n0 |
float, pseudo-count to be added before log-transformation of
the data, defaults to |
pr.cutoff |
float, cutoff for the prevalence computation, defaults to
|
probs.fc |
numeric vector, quantiles used to calculate the generalized
fold change between groups, see Details for more information,
defaults to |
paired |
character, column name of the meta-variable containing
information for a paired test, defaults to |
feature.type |
string, on which type of features should the function
work? Can be either If |
verbose |
integer, control output: |
object of class siamcat-class with the slot
associations
filled
The function uses the Wilcoxon test as default statistical test for binary classification problems. Alternatively, a simple linear model (as implemented in lm) can be used as well. For regression problems, the function defaults to the linear model.
The function calculates several measures for the effect size of the assocations between microbial features and the label. For binary classification problems, these associations are:
AUROC (area under the Receiver Operating Characteristics curve) as a non-parametric measure of enrichment,
the generalized fold change (gFC), a pseudo-fold change which is calculated as geometric mean of the differences between quantiles across both groups,
prevalence shift (difference in prevalence between the two groups).
For regression problems, the effect sizes are:
Spearman correlation between the feature and the label.
To correct for possible confounders while testing for association, the
function uses linear mixed effect models as implemented in the
lmerTest package. To do so, the test formula needs to be adjusted
to include the confounder. For example, when correcting for the metadata
information Sex
, the formula would be:
'feat~label+(1|Sex)'
(see also the example below).
Please note that modifying the formula parameter in this function might lead to unexpected results!
For paired testing, e.g. when the same patient has been sampled before and after an intervention, the 'paired' parameter can be supplied to the function. This indicated a column in the metadata table that holds the information about pairing.
# Example data
data(siamcat_example)
# Simple example
siamcat_example <- check.associations(siamcat_example)
# Confounder-corrected testing (corrected for Sex)
#
# this is not run during checks
# siamcat_example <- check.associations(siamcat_example,
# formula='feat~label+(1|Sex)', test='lm')
# Paired testing
#
# this is not run during checks
# siamcat_paired <- check.associations(siamcat_paired,
# paired='Individual_ID')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.