knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE ) suppressPackageStartupMessages(library(metagenomeFeatures)) suppressPackageStartupMessages(library(phyloseq))
r Biocpkg("metagenomeFeatures")
and associated annotation packages[^1] can be used to obtain phylogentic trees and representative sequences for 16S rRNA marker gene sequence experimental datasets when the reads are clustered using closed reference clustering and a database with an MgDb
annotation package.
In this vignette we demostrate how to use metagenomeFeatures
and Greengenes 16S rRNA database version 13.8 85\% OTUs to obtain a phylogenetic tree and representative sequences for Rousk et al. [-@rousk2010soil],
a soil microbiome study.
We then use the tree and sequence data obtained from the Greengenes MgDb
to generate a phyloseq-class
object for community analysis[^2].
The gg 13.8 85% OTU (gg85
) is provided as part of the r Biocpkg("metagenomeFeatures")
package.
gg85
is a MgDb-class
object with the taxonomic heirarchy, sequence data, and phylogeny for the Greengenes database clustered at the 0.85 similarity threshold.
After loading the r Biocpkg("metagenomeFeatures")
package gg85
is loaded using get_gg13.8_85MgDb()
.[^3]
library(metagenomeFeatures) gg85 <- get_gg13.8_85MgDb()
QIITA is a public repository for sharing OTU tables and raw sequence data from 16S rRNA marker gene studies, however some of the datasets do not include representative sequences or phylogenetic trees. For this vignette we are using data from the previously mentioned soil microbiome study, https://qiita.ucsd.edu/study/description/94 [@rousk2010soil]. A BIOM and qiime mapping file for the study can be obtained from QIITA. A vector of Greengenes ids for the study cluster centers is included in this package for use in this vignette.
data(qiita_study_94_gg_ids)
A mgFeatures
object is generated from gg85
(or any other MgDb-class
object) using the annotateFeatures()
function along with a set of database ids or keys.
The resulting mgFeatures
class object has the taxonomic heirarchy, phylogeny, and sequence data for the study OTUs.[^3]
soil_mgF <- annotateFeatures(gg85, qiita_study_94_gg_ids) ## Taxonomic heirarchy soil_mgF # Sequence data mgF_seq(soil_mgF) # Tree data mgF_tree(soil_mgF)
mgFeatures-class
object with r Biocpkg("phyloseq")
Finally we use the phyogenetic tree from the mgFeatures-class
object along with the OTU table for a beta diversity analysis.
After loading the r Biocpkg("phyloseq")
package we will load the abundance data from the QITTA biom file, using the r Biocpkg("phyloseq")
command import_biom()
.
Next we will define the sample_data
slot using a modified version of the mapping file 2301_mapping_file.txt
obtained from QITTA, which only includes sample names and experimental variables pH, carbon/nitrogen ratio, total nitrogen, and total organic carbon.
Future development of a common data structure for working with microbiome data that extends the SummarizeExperiment-class
is underway.
For this new class the mgFeatures-class
will define the rowData
slot.
data_dir <- system.file("extdata", package = "metagenomeFeatures") ## Load Biom biom_file <- file.path(data_dir, "229_otu_table.biom") soil_ps <- phyloseq::import_biom(BIOMfilename = biom_file) ## Define sample data sample_file <- file.path(data_dir, "229_sample_data.tsv") sample_dat <- read.delim(sample_file) ## Rownames matching sample_names(), required for phyloseq sample_data slot rownames(sample_dat) <- sample_dat$SampleID sample_data(soil_ps) <- sample_dat ## Resulting phyloseq object soil_ps
Rousk et al. [-@rousk2010soil] clustered the sequences at 97\% similarity therefore
gg85
only contains a subset of the OTUs in the dataset.
To define the tree and sequence slot we need to subset soil_ps
to only include OTUs in gg85
.
Samples with no OTUs in gg85
were also removed for our beta diversity analysis.
# Removing OTUs not in `gg85` soil_tree <- mgF_tree(soil_mgF) soil_ps_gg85 <- prune_taxa(taxa = soil_tree$tip.label, x = soil_ps) # Removing samples with no OTUs in `gg85` samples_to_keep <- sample_sums(soil_ps_gg85) != 0 soil_ps_gg85 <- prune_samples(samples = samples_to_keep, x = soil_ps_gg85)
Now soil_mgF
can be used to to define our soil_ps
object refseq
and tree
slots.
## Defining tree slot phy_tree(physeq = soil_ps_gg85) <- soil_tree ## Defining seq slot soil_ps_gg85@refseq <- mgF_seq(soil_mgF)
Now that the tree
slot is defined we can perform microbial community analysis requiring a phylogenetic tree such as phylogenetic beta diversity.
For the subset of features used here, pH is a primary driver for differences in beta diversity between the samples (Figure \@ref(fig:betaFig)).
Nearly 30\% of the total variation in the data is explained by the first axis and sample pH is correlated with the first axis.
This result is consistent with Rousk et al. [-@rousk2010soil] even though we only used a small subset of the study OTUs. The outlier sample (94.PH18) is an artifact of using a small subset of the original dataset, only one gg85
OTU is observed in the sample and that OTU was not observed in the other samples.
soil_ord <- ordinate(physeq = soil_ps_gg85, distance = "wunifrac", method = "PCoA") plot_ordination(soil_ps_gg85, soil_ord, color = "ph", type="sample", label = "SampleID")
sessionInfo()
[^1]: Other databases are avilable as Bioconductor AnnotationData including; Greengenes Release 13.5: r Biocpkg("greengenesMgDb13.5")
, Greengenes Release 13.8 99% OTUs,r Biocpkg("greengenes13.8_99MgDb")
Ribosomal Database Project Release 11.5, r Biocpkg("ribosomaldatabaseproject11.5MgDb")
, and Silva 128.1: r Biocpkg("silva128.1MgDb")
.
[^2]: The r Biocpkg("phyloseq")
package defines the phyloseq-calss
for working with 16S rRNA experimental data.
[^3]: See the "metagenomeFeatures classes and methods." vignette, Vignettes("metagenomeFeatures classes and methods.")
for more information on methods for working with MgDb-class
and mgFeatures-class
objects.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.