knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE )
HMP2Data
can be installed using BiocManager
as follows.
# Check if BiocManager is installed if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") # Install HMP2Data package using BiocManager BiocManager::install("HMP2Data")
Or the development version of the package can be installed from GitHub as shown below.
BiocManager::install("jstansfield0/HMP2Data")
The following packages will be used in this vignette to provide demonstrative
examples of what a user might do with r BiocStyle::Biocpkg("HMP2Data")
.
library(HMP2Data) library(phyloseq) library(SummarizedExperiment) library(MultiAssayExperiment) library(dplyr) library(ggplot2) library(UpSetR)
Once installed, HMP2Data
provides functions to access the HMP2 data.
The MOMS-PI data can be loaded as follows.
Load 16S data as a matrix, rows are Greengene IDs, columns are sample names:
data("momspi16S_mtx") momspi16S_mtx[1:5, 1:3]
Load the Greengenes taxonomy table as a matrix, rows are Greengene IDs, columns are taxonomic ranks:
data("momspi16S_tax") colnames(momspi16S_tax) momspi16S_tax[1:5, 1:3]
# Check if Greengene IDs match between the 16S and taxonomy data # all.equal(rownames(momspi16S_mtx), rownames(momspi16S_tax)) # Should be TRUE
Load the 16S sample annotation data as a matrix, rows are samples, columns are annotations:
data("momspi16S_samp") colnames(momspi16S_samp) momspi16S_samp[1:5, 1:3] # Check if sample names match between the 16S and sample data # all.equal(colnames(momspi16S_mtx), rownames(momspi16S_samp)) # Should be TRUE
The momspi16S
function assembles those matrices into a phyloseq
object.
momspi16S_phyloseq <- momspi16S() momspi16S_phyloseq
Here we have a phyloseq
object containing the 16S rRNA seq data for r nrow(otu_table(momspi16S_phyloseq))
taxa and r ncol(otu_table(momspi16S_phyloseq))
samples.
The MOMS-PI cytokine data can be loaded as a matrix, rownames are cytokine names, colnames are sample names:
data("momspiCyto_mtx") momspiCyto_mtx[1:5, 1:5]
The cytokine data has r nrow(momspiCyto_mtx)
rows and r ncol(momspiCyto_mtx)
columns.
Load the cytokine sample annotation data as a matrix, rows are samples, columns are annotations:
data("momspiCyto_samp") colnames(momspiCyto_samp) momspiCyto_samp[1:5, 1:5]
# Check if sample names match between the 16S and sample data # all.equal(colnames(momspiCyto_mtx), rownames(momspiCyto_samp)) # Should be TRUE
The function momspiCytokines
will make a SummarizedExperiment
containing cytokine data
momspiCyto <- momspiCytokines() momspiCyto
The cytokine data contains data for r nrow(momspiCyto)
cytokines over r ncol(momspiCyto)
samples.
We can construct a MultiAssayExperiment
that will contain 16S and cytokine data for the common samples. Use the momspiMultiAssay
function.
momspiMA <- momspiMultiAssay() momspiMA
The 16S rRNA data can be extracted from the MultiAssayExperiment object as follows.
rRNA <- momspiMA[[1L]]
Or the cytokines data like so.
cyto <- momspiMA[[2L]]
The sample data can be accessed with the colData
command.
colData(momspiMA) %>% head()
To extract only the samples with both 16S and cytokine data we can use the intersectColumns
function.
completeMA <- intersectColumns(momspiMA) completeMA
Load 16S data as a matrix, rows are SILVA IDs, columns are sample names:
data("IBD16S_mtx") IBD16S_mtx[1:5, 1:5]
Load the SILVA taxonomy table as a matrix, rows are SILVA IDs, columns are taxonomic ranks:
data("IBD16S_tax") colnames(IBD16S_tax) IBD16S_tax[1:5, 1:5]
Load the 16S sample annotation data as a matrix, rows are samples, columns are annotations:
data("IBD16S_samp") colnames(IBD16S_samp) %>% head() IBD16S_samp[1:5, 1:5]
The IBD phyloseq
object can be loaded as follows.
IBD <- IBD16S() IBD
Load 16S data as a matrix, rows are Greengene IDs, columns are sample names:
data("T2D16S_mtx") T2D16S_mtx[1:5, 1:5]
Load the Greengenes taxonomy table as a matrix, rows are Greengene IDs, columns are taxonomic ranks:
data("T2D16S_tax") colnames(T2D16S_tax) T2D16S_tax[1:5, 1:5]
Load the 16S sample annotation data as a matrix, rows are samples, columns are annotations:
data("T2D16S_samp") colnames(T2D16S_samp) T2D16S_samp[1:5, 1:5]
The T2D phyloseq
object can be loaded like so.
T2D <- T2D16S() T2D
The sample-level annotation data contianed in HMP2Data
can be summarized using the table_two
function. First, we need to make a list of the phyloseq
and SummarizedExperiment
objects which can then be entered into the table_two()
table generating function.
list("MOMS-PI 16S" = momspi16S_phyloseq, "MOMS-PI Cytokines" = momspiCyto, "IBD 16S" = IBD, "T2D 16S" = T2D) %>% table_two()
We can also create a table of the breakdown of samples by visit number quantile.
list("MOMS-PI 16S" = momspi16S_phyloseq, "MOMS-PI Cytokines" = momspiCyto, "IBD 16S" = IBD, "T2D 16S" = T2D) %>% visit_table()
The patient-level characteristics can be summarized using the patient_table()
function.
list("MOMS-PI 16S" = momspi16S_phyloseq, "MOMS-PI Cytokines" = momspiCyto, "IBD 16S" = IBD, "T2D 16S" = T2D) %>% patient_table()
Here we plot histograms of the number of samples at each visit for the data MOMS-PI 16S data. Note that there are r sum(momspi16S_samp$visit_number > 20)
samples at visit numbers over 20.
# set up ggplots p1 <- ggplot(momspi16S_samp, aes(x = visit_number)) + geom_histogram(bins = 20, color = "#00BFC4", fill = "lightblue") + xlim(c(0,20)) + xlab("Visit number") + ylab("Count") # theme(panel.background = element_blank(), panel.grid = element_blank()) p1
We can plot the histogram of the number of samples at each visit for all studies. Note that the X-axis is capped at count of 40 for better visualization.
# make data.frame for plotting plot_visits <- data.frame(study = c(rep("MOMS-PI Cytokines", nrow(momspiCyto_samp)), rep("IBD 16S", nrow(IBD16S_samp)), rep("T2D 16S", nrow(T2D16S_samp))), visits = c(momspiCyto_samp$visit_number, IBD16S_samp$visit_number, T2D16S_samp$visit_number)) p2 <- ggplot(plot_visits, aes(x = visits, fill = study)) + geom_histogram(position = "dodge", alpha = 0.7, bins = 30, color = "#00BFC4") + xlim(c(0, 40)) + theme(legend.position = c(0.8, 0.8)) + xlab("Visit number") + ylab("Count") p2
Note that there are r sum(plot_visits$visits > 40)
samples at visit numbers over 40.
Here we plot the body sites which samples were taken from for each patient in the MOMS-PI 16S data.
# set up data.frame for UpSetR tmp_data <- split(momspi16S_samp, momspi16S_samp$subject_id) momspi_upset <- lapply(tmp_data, function(x) { table(x$sample_body_site) }) momspi_upset <- bind_rows(momspi_upset) tmp <- as.matrix(momspi_upset[, -1]) tmp <- (tmp > 0) *1 tmp[is.na(tmp)] <- 0 momspi_upset <- data.frame(patient = names(tmp_data), tmp) # plot upset(momspi_upset, order.by = 'freq', matrix.color = "blue", text.scale = 2)
Here we plot the body sites which samples were taken from for each patient in the MOMS-PI cytokines data.
# set up data.frame for UpSetR tmp_data <- split(momspiCyto_samp, momspiCyto_samp$subject_id) momspiCyto_upset <- lapply(tmp_data, function(x) { table(x$sample_body_site) }) momspiCyto_upset <- bind_rows(momspiCyto_upset) tmp <- as.matrix(momspiCyto_upset[, -1]) tmp <- (tmp > 0) *1 tmp[is.na(tmp)] <- 0 momspiCyto_upset <- data.frame(patient = names(tmp_data), tmp) # plot upset(momspiCyto_upset, order.by = 'freq', matrix.color = "blue", text.scale = 2)
The IBD patients only had samples taken from the feces.
# set up data.frame for UpSetR tmp_data <- split(IBD16S_samp, IBD16S_samp$subject_id) IBD_upset <- lapply(tmp_data, function(x) { table(x$biopsy_location) }) IBD_upset <- bind_rows(IBD_upset) tmp <- as.matrix(IBD_upset[, -1]) tmp <- (tmp > 0) *1 tmp[is.na(tmp)] <- 0 IBD_upset <- data.frame(patient = names(tmp_data), tmp) # plot upset(IBD_upset, order.by = 'freq', matrix.color = "blue", text.scale = 2)
Here we plot the body sites which samples were taken from for each patient in the T2D 16S rRNA data.
# set up data.frame for UpSetR tmp_data <- split(T2D16S_samp,T2D16S_samp$subject_id) T2D_upset <- lapply(tmp_data, function(x) { table(x$sample_body_site) }) T2D_upset <- bind_rows(T2D_upset) tmp <- as.matrix(T2D_upset) tmp <- (tmp > 0) *1 tmp[is.na(tmp)] <- 0 T2D_upset <- data.frame(patient = names(tmp_data), tmp) # plot upset(T2D_upset, order.by = 'freq', matrix.color = "blue", text.scale = 2)
To our knowledge, R and Bioconductor provide the most and best methods for the
analysis of microbiome data. However, we realize they are not the only analysis
environments and wish to provide methods to export the data from
r BiocStyle::Biocpkg("HMP16SData")
to CSV format.
For SummarizedExperiment
objects, we will need to separate the data and metadata first before exportation. First, make a data.frame
for participant data.
momspi_cytokines_participants <- colData(momspiCyto) momspi_cytokines_participants[1:5, ]
Then we need to pull out the data matrix.
momspi_cytokines <- assay(momspiCyto) momspi_cytokines[1:5, 1:5]
For phyloseq
objects the data, metadata, and taxonomy can be separated as follows. First, pull out metadata.
momspi_16S_participants <- sample_data(momspi16S_phyloseq)
Next, we can save the counts data as a matrix.
momspi16s_data <- as.matrix(otu_table(momspi16S_phyloseq))
Finally, the taxonomy table can be extracted.
momspi16s_taxa <- tax_table(momspi16S_phyloseq) %>% as.data.frame()
The data can be saved as .csv
files as follows.
library(readr) write_csv(data.frame(file_id = rownames(momspi_cytokines_participants), momspi_cytokines_participants), "momspi_cytokines_participants.csv") write_csv(data.frame(momspi_cytokines), "momspi_cytokines.csv")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.