knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Introduction

The MicrobiomeBenchamrkData package provides access to a collection of datasets with biological ground truth for benchmarking differential abundance methods. The datasets are deposited on Zenodo: https://doi.org/10.5281/zenodo.6911026

Installation

## Install BioConductor if not installed
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

## Release version (not yet in Bioc, so it doesn't work yet)
BiocManager::install("MicrobiomeBenchmarkData")

## Development version
BiocManager::install("waldronlab/MicrobiomeBenchmarkData") 
library(MicrobiomeBenchmarkData)
library(purrr)

Sample metadata

All sample metadata is merged into a single data frame and provided as a data object:

data('sampleMetadata', package = 'MicrobiomeBenchmarkData')
## Get columns present in all samples
sample_metadata <- sampleMetadata |> 
    discard(~any(is.na(.x))) |> 
    head()
knitr::kable(sample_metadata)

Accessing datasets

Currently, there are r nrow(MicrobiomeBenchmarkData::getBenchmarkData()) datasets available through the MicrobiomeBenchmarkData. These datasets are accessed through the getBenchmarkData function.

Print avaialable datasets

If no arguments are provided, the list of available datasets is printed on screen and a data.frame is returned with the description of the datasets:

dats <- getBenchmarkData()
dats

Access a single dataset

In order to import a dataset, the getBenchmarkData function must be used with the name of the dataset as the first argument (x) and the dryrun argument set to FALSE. The output is a list vector with the dataset imported as a TreeSummarizedExperiment object.

tse <- getBenchmarkData('HMP_2012_16S_gingival_V35_subset', dryrun = FALSE)[[1]]
tse

Access a few datasets

Several datasets can be imported simultaneously by giving the names of the different datasets in a character vector:

list_tse <- getBenchmarkData(dats$Dataset[2:4], dryrun = FALSE)
str(list_tse, max.level = 1)

Access all of the datasets

If all of the datasets must to be imported, this can be done by providing the dryrun = FALSE argument alone.

mbd <- getBenchmarkData(dryrun = FALSE)
str(mbd, max.level = 1)

Annotations for each taxa are included in rowData

The biological annotations of each taxa are provided as a column in the rowData slot of the TreeSummarizedExperiment.

## In the case, the column is named as taxon_annotation 
tse <- mbd$HMP_2012_16S_gingival_V35_subset
rowData(tse)

Cache

The datasets are cached so they're only downloaded once. The cache and all of the files contained in it can be removed with the removeCache function.

removeCache()

Session information

sessionInfo()


waldronlab/MicrobiomeBenchmarkData documentation built on Oct. 31, 2024, 3:43 a.m.