r library("knitr")
r opts_chunk$set(cache=FALSE, fig.width=9, message=FALSE, warning=FALSE)
If you find biomformat and/or its tutorials useful, please acknowledge and cite biomformat in your publications:
Paul J. McMurdie and Joseph N Paulson (2015). biomformat: An interface package for the BIOM file format. R/Bioconductor package version 1.0.0.
phyloseq has many more utilities for interacting with this kind of data than are provided in this I/O-oriented package.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages. Further details on R markdown here.
The BIOM file format (canonically pronounced "biome") is designed to be a general-use format for representing biological sample by observation contingency tables. BIOM is a recognized standard for the Earth Microbiome Project and is a Genomics Standards Consortium candidate project. Please see the biom-format home page for more details.
This demo is designed to provide an overview of the biom package to get you started using it quickly. The biom package itself is intended to be a utility package that will be depended-upon by other packages in the future. It provides I/O functionality, and functions to make it easier to with data from biom-format files. It does not (and probably should not) provide statistical analysis functions. However, it does provide tools to access data from BIOM format files in ways that are extremely common in R (such as "data.frame"
, "matrix"
, and "Matrix"
classes).
Package versions at the time (r date()
) of this build:
library("biomformat"); packageVersion("biomformat")
Here is an example importing BIOM formats of different types into R using the read_biom
function. The resulting data objects in R are given names beginning with x
.
min_dense_file = system.file("extdata", "min_dense_otu_table.biom", package = "biomformat") min_sparse_file = system.file("extdata", "min_sparse_otu_table.biom", package = "biomformat") rich_dense_file = system.file("extdata", "rich_dense_otu_table.biom", package = "biomformat") rich_sparse_file = system.file("extdata", "rich_sparse_otu_table.biom", package = "biomformat") min_dense_file = system.file("extdata", "min_dense_otu_table.biom", package = "biomformat") rich_dense_char = system.file("extdata", "rich_dense_char.biom", package = "biomformat") rich_sparse_char = system.file("extdata", "rich_sparse_char.biom", package = "biomformat") x1 = read_biom(min_dense_file) x2 = read_biom(min_sparse_file) x3 = read_biom(rich_dense_file) x4 = read_biom(rich_sparse_file) x5 = read_biom(rich_dense_char) x6 = read_biom(rich_sparse_char) x1
It would be hard to interpret and wasteful of RAM to stream all the data from a BIOM format file to the standard out when printed with print
or show
methods. Instead, a brief summary of the contents BIOM data is shown.
To get access to the data in a familiar form appropriate for many standard R functions, we will need to use accessor functions, also included in the biom package.
The core "observation" data is stored in either sparse or dense matrices in the BIOM format file, and sparse matrix support is carried through on the R side via the Matrix package. The variables x1
and x2
, assigned above from BIOM files, have similar (but not identical) data. They are small enough to observe directly as tables in standard output in an R session:
biom_data(x1) biom_data(x2)
As you can see above, x1
and x2
are represented in R by slightly different matrix classes, assigned automatically based on the data. However, most operations in R will understand this automatically and you should not have to worry about the precise matrix class. However, if the R function you are attempting to use is having a problem with these fancier classes, you can easily coerce the data to the simple, standard "matrix"
class using the as
function:
as(biom_data(x2), "matrix")
Observation metadata is metadata associated with the individual units being counted/recorded in a sample, as opposed to measurements of properties of the samples themselves. For microbiome census data, for instance, observation metadata is often a taxonomic classification and anything else that might be known about a particular OTU/species. For other types of data, it might be metadata known about a particular genome, gene family, pathway, etc. In this case, the observations are counts of OTUs and the metadata is taxonomic classification, if present. The absence of observation metadata is also supported, as we see for the minimal cases of x1
and x2
, where we see r NULL
instead.
observation_metadata(x1) observation_metadata(x2) observation_metadata(x3) observation_metadata(x4)[1:2, 1:3] class(observation_metadata(x4))
Similarly, we can access metadata -- if available -- that describe properties of the samples. We access this sample metadata using the sample_metadata
function.
sample_metadata(x1) sample_metadata(x2) sample_metadata(x3) sample_metadata(x4)[1:2, 1:3] class(sample_metadata(x4))
The data really is accessible to other R functions.
plot(biom_data(x4)) boxplot(as(biom_data(x4), "vector")) heatmap(as(biom_data(x4), "matrix"))
The biom objects in R can be written to a file/connection using the write_biom
function. If you modified the biom object, this may still work as well, but no guarantees about this as we are still working on internal checks. The following example writes x4
to a temporary file, then reads it back using read_biom
and stores it as variable y
. The exact comparison of these two objects using the identical
function shows that they are exactly the same in R.
outfile = tempfile() write_biom(x4, outfile) y = read_biom(outfile) identical(x4, y)
Furthermore, it is possible to invoke standard operating system commands through the R system
function -- in this case to invoke the diff
command available on Unix-like systems or the FC
command on Windows -- in order to compare the original and temporary files directly. Note that this is shown here for convenience, but not automatically run with the rest of the script because of the OS-dependence. During development, though, this same command is tested privately and no differences are reported between the files.
# On Unix OSes system(paste0("diff ", rich_sparse_file, outfile)) # On windows system(paste0("FC ", rich_sparse_file, outfile))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.