knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
BiocProject
is a (pending) Bioconductor
package that provides a way to use Portable Encapsulated Projects (PEPs) within
Bioconductor framework.
This vignette assumes you are already familiar with PEPs.
If not, see pep.databio.org to learn more about
PEP, and the pepr documentation to learn more
about reading PEPs in R
.
BiocProject
uses objects of Project
class (from pepr
)
to handle your project metadata, and allows you to provide a data
loading/processing function so that you can load both project metadata and data
for an entire project with a single line of R
code.
The output of the BiocProject
function is the object that your function
returns, but enriched with the PEP in its metadata
slot. This way of
metadata storage is uniform across all objects within Bioconductor project
(see: ?Annotated-class
for details).
You must first install pepr
:
devtools::install_github(repo='pepkit/pepr')
Then, install BiocProject
:
devtools::install_github(repo='pepkit/BiocProject')
In order to use the BiocProject
package, you first need a PEP. For this
vignette, we have included a basic example PEP within the package, but if you
like, you can create your own, or download
an example PEP.
The central component of a PEP is the project configuration file. Let's load
up BiocProject
and grab the path to our example configuration file:
library(BiocProject) configFile = system.file( "extdata", "example_peps-master", "example_BiocProject", "project_config.yaml", package = "BiocProject" ) configFile
# Run some stuff we need for the vignette processFunction = system.file( "extdata", "example_peps-master", "example_BiocProject", "readBedFiles.R", package = "BiocProject" ) source(processFunction) bp = BiocProject(file=configFile)
This path points to a YAML project config file, that looks like this:
library(pepr) .printNestedList(yaml::read_yaml(configFile))
This configuration file points to the second major part of a PEP: the
sample annotation CSV file (r { basename(config(bp)$sample_table) }
).
Here are the contents of that file:
library(knitr) sampleAnnotation = system.file( "extdata", "example_peps-master", "example_BiocProject", "sample_table.csv", package = "BiocProject" ) sampleAnnotationDF = read.table(sampleAnnotation, sep=",", header=TRUE) knitr::kable(sampleAnnotationDF, format = "html")
In this example, our PEP has two samples, which have two attributes:
sample_name
, and file_path
, which points the location for the data.
The configuration file also points to a third
file (r { basename(config(bp)$bioconductor$readFunPath) }
). This file holds
a single R
function called
r { basename(config(bp)$bioconductor$readFunName) }
, which has these
contents:
get(config(bp)$bioconductor$readFunName)
And that's all there is to it! This PEP consists really of 3 components:
With that, we're ready to see how BiocProject
works.
BiocProject
functionWith a PEP in hand, it takes only a single line of code to do all the magic
with BiocProject
:
bp = BiocProject(file=configFile)
This loads the project metadata from the PEP, then loads and calls the actual data processing function, and returns the R object that the data processing function produces, but enriched with the PEP metadata. Consequently, the object contains all your project metadata and data! Let's inspect the it:
bp
Since the data processing function returned GenomicRanges::GRangesList
object, the final result of the BiocProject
function is an object of the
same class.
The created object provides all the pepr::Project
methods (which you can
find in the reference
documentation) for pepr.
sampleTable(bp) config(bp)
Finally, there are a few methods specific to BiocProject
objects:
getProject(bp)
In the basic case the function name (and path to source file, if necessary) is specified in the YAML config file itself, like:
bioconductor: readFunName: function_name
or
bioconductor: readFunName: function_name readFunPath: /path/to/the/file.R
The function specified can be a data processing function of any complexity, but has to follow 3 rules listed below.
pepr::Project
object (should use that input to
load all the relevant data into R
),Annotated
.Listed below are some of the classes that extend the class Annotated
:
showClass("Annotated")
Consider the readBedFiles
function as an example of a function that can be used
with BiocProject
package:
processFunction = system.file( "extdata", "example_peps-master", "example_BiocProject", "readBedFiles.R", package = "BiocProject" ) source(processFunction) readBedFiles
The BiocProject
function provides a way to rigorously monitor exceptions
related to your data reading function. All the produced warnings and errors
are caught, processed and displayed in an organized way:
configFile = system.file( "extdata", "example_peps-master", "example_BiocProject_exceptions", "project_config.yaml", package = "BiocProject" ) bpExceptions = BiocProject(configFile)
As indicated in the warning messages above -- no data is being returned.
Instead a S4Vectors::List
with a PEP is its metadata
slot is produced.
bpExceptions
See "More arguments than just a PEP in your function?" vignette if you want to:
See the "Working with remote data" vignette to learn how to download the data from the Internet, process it and store it conveniently with related metadata in any object from the Bioconductor project.
See the
"Working with large datasets - simpleCache"
vignette to learn how the simpleCache
R package can be used to prevent
copious and lengthy results recalculations when working with large datasets.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.