knitr::opts_chunk$set(fig.width=16, fig.height=8, cache = TRUE) library(phylosmith)
Examples used in this vignette will use the GlobalPatterns
dataset from
phyloseq
.
library(phyloseq) data(GlobalPatterns)
This function is an implementation of the pair-wise Spearman rank correlation that you can call in R
. The results should be identical. FastCoOccur was originally written by Jin Choi in C++ to be called from the command-line. I rewrote the routine, implementing the Rcpp API, so that it could be called within R, and with a phyloseq-object as the input.
Because of how I implemented the multithreading, there is a large diminishing return to the number of cores used. I found the optimal to be around 4 CPUs, depending on the size of your dataset.
Usage
co_occurrence(phyloseq_obj, treatment = NULL, subset = NULL, rho = 0, p = 0.05, cores = 0)
Arguments
Call | Description
-------------------- | ------------------------------------------------------------
phyloseq_obj
| A phyloseq-class object.
treatment
| Column name as a string
, or vector
of, in the sample_data
.
p
| The p-value cutoff. All returned co-occurrences will have a p-value less than or equal to p
.
cores
| Number of CPU cores to use for the pair-wise permutations. Default (0) uses max cores available. Parallelization not available for systems running MacOS without openMP
Examples
filtered_obj <- conglomerate_taxa(GlobalPatterns, "Phylum") co_occurrence(filtered_obj, "SampleType", rho = 0.8, p = 0.05)
Used to determine a cutoff for which $\rho$ vlaues are not likely to have occured by random chance. This will create permutations of all combinations of abundances and calculate the pair-wise Spearman rank co-occurence for each. The return will be a distribution of $\rho$ values that can be used to determing the significance.
The permutation returns a large table of rho-values, so RAM becomes a bottle-neck resource. To deal with this, the results are merged into a single frequency table, which saves an enormous amount of memory. Unfortunately, this doesn't allow for simple and intuitive multithreading. Because of that, there is a large diminishing return to the number of cores used. I found the optimal to be around 4 CPUs, depending on the size of your dataset.
Usage
permute_rho(phyloseq_obj, treatment, replicate_samples = 'independent', permutations = 100, cores = 0)
Arguments
Call | Description
-------------------- | ------------------------------------------------------------
phyloseq_obj
| A phyloseq-class object.
treatment
| Column name as a string
, or vector
of, in the sample_data
.
replicate_samples
| Column name as a string or number in the sample_data that indicates which samples are non-independent of each other.
permutations
| Number of iterations to compute.
cores
| Number of CPU cores to use for the pair-wise permutations. Default (0) uses max cores available. Parallelization not available for systems running MacOS without openMP configuration.
Examples
filtered_obj <- conglomerate_taxa(GlobalPatterns, "Phylum") permute_rho(filtered_obj, "SampleType", replicate_samples = "SampleType", permutations = 5)
Creates a ggplot object of the histogram of the rho values by treatment. This is a visualization tool to explain how to find cutoffs, but will also calculate the cutoffs.
Usage
histogram_permuted_rhos(permuted_rhos, p = 0.05, x_breaks = 0.25, colors = 'default')
Arguments
Call | Description
-------------------- | ------------------------------------------------------------
permuted_rhos
| A data.table
output from permute_rho
.
p
| The significance threshold for setting cutoffs.
x_breaks
| What intervals to set the ticks on the x-axis.
colors
| Name of a color set from the RColorBrewer
package or a vector palete of R-accepted colors.
Examples
filtered_obj <- conglomerate_taxa(GlobalPatterns, "Phylum") permuted_rhos <- permute_rho(filtered_obj, "SampleType", replicate_samples = "SampleType", permutations = 5) histogram_permuted_rhos(permuted_rhos, p = 0.05)
Calculate quantiles for the permuted rho values from the Spearman-rank co-occurrence. Can be done by treatments or for the overall data. This looks at both tails of the distribution, so it will show the p/2 cutoffs on both ends.
Usage
quantile_permuted_rhos(permuted_rhos, p = 0.05, by_treatment = TRUE)
Arguments
Call | Description
-------------------- | ------------------------------------------------------------
permuted_rhos
| A data.table
output from permute_rho
.
p
| The significance threshold for setting cutoffs.
by_treatment
| Whether to find the rho cutoffs for each treatment individually or for the entire experiment. Suggested to do by treatment first, to see if there is any treatments that are outliers.
Examples
filtered_obj <- conglomerate_taxa(GlobalPatterns, "Phylum") permuted_rhos <- permute_rho(filtered_obj, "SampleType", replicate_samples = "SampleType", permutations = 5) quantile_permuted_rhos(permuted_rhos, p = 0.05) quantile_permuted_rhos(permuted_rhos, p = 0.05, by_treatment = FALSE)
Computes the correlation of numerical variables with taxa abundances. Function from the phylosmith-package.
Usage
variable_correlation(phyloseq_obj, variables, treatment = NULL,subset = NULL, classification = NULL, method = 'spearman', cores = 1)
Arguments
Call | Description
-------------------- | ------------------------------------------------------------
phyloseq_obj
| A phyloseq-class object. It must contain sample_data
with information about each sample, and it must contain tax_table
with information about each taxa/gene.
variables
| Numerical factors within the in the sample_data to correlate with the abundance data.
treatment
| Column name as a string
or numeric
in the sample_data
. This can be a vector
of multiple columns and they will be combined into a new column.
subset
| A factor within the treatment
. This will remove any samples that to not contain this factor. This can be a vector of multiple factors to subset on.
classification
| Column name as a string or numeric in the tax_table for the factor to conglomerate by.
method
| Which correlation method to calculate, "pearson", "spearman".
cores
numeric value for number of CPU cores to use for the pair-wise permutations. Default (1), 0
uses max cores available. Parallelization not available for systems running MacOS without openMP configuration.
Examples
data(enterotype) filtered_obj <- phyloseq::subset_samples(enterotype, !is.na(enterotype@sam_data$ClinicalStatus)) filtered_obj <- taxa_filter(filtered_obj, frequency = 0.65) variable_correlation(filtered_obj, variables = "Age", treatment = "ClinicalStatus", subset = c("healthy", "elderly", "obese"), classification = "Genus", method = "spearman")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.