knitr::opts_chunk$set(collapse=FALSE, message=FALSE)
This vignette demonstrates how to integrate BiocProject with the tximeta Bioconductor package for a really slick start-to-finish analysis of RNA-seq data. We assume you're familiar with BiocProject; if not, please start with Getting started with BiocProject
vignette for basic instructions.
Tximeta is a package that imports transcript quantification files from the salmon transcript quantifier. When importing, tximeta automatically annotates the data with the transcriptome used. How it works is that salmon
records a unique identifier of the transcriptome it uses during quantification; then, tximeta reads this identifier and looks up metadata about those sequences using a local database of known transcriptome identifiers. For more details, refer to the tximeta GitHub repository or publication in PLoS Computational Biology.
The tximeta::tximeta
function takes as input a data.frame
(coldata
) object that, for Salmon results, points to a quantification results directory for each sample. The tximeta
function reads the *.sa
files and returns a single SummarizedExperiment
object with the Salmon-generated metadata in the object metadata
slot.
Since SummarizedExperiment
inherits from the Bioconductor Annotated
class, it fits perfectly into BiocProject
output object class requirements.
suppressPackageStartupMessages(library(BiocProject)) suppressPackageStartupMessages(library(SummarizedExperiment)) is(SummarizedExperiment(), "Annotated")
If we add BiocProject in to the tximeta workflow, then sample metadata from the PEP project specification can be easily plugged in! For example, if a researcher used a PEP to run Salmon to quantify reads across multiple samples with PEP-compatible workflow management engine/job scatterer like Snakemake, CWL, or looper, the same PEP would be ready to use with tximeta as long as the samples had files
attribute defined. This could be done either via a files
column in the sample table, or by using one of the sample modifiers provided by the PEP framework. The advantages of calling tximport
within BiocProject
include:
tximeta
function automatically. It will be accessible from within your R session using the pepr API, or with @PEP
in the metadata
slot of the SummarizedExperiment
object, just as any other metadata attached to the result by tximeta
function.Let's show you how this work with a simple demo.
First, let's download some RNA-seq counts from salmon, described in PEP format:
if (basename(getwd()) != "long_vignettes") setwd("long_vignettes") pth = BiocFileCache::bfcrpath( BiocFileCache::BiocFileCache(getwd()), "http://big.databio.org/example_data/tximeta_pep.tar.gz" ) utils::untar(tarfile=pth) abs_pep_path = file.path(getwd(), "tximeta_pep") abs_cfg_path = file.path(abs_pep_path, "project_config.yaml")
Let's take a look at what we have here...
The Biocproject
+ tximeta
workflow requires a PEP. The example we just downloaded looks like this:
library(pepr) .printNestedList(yaml::read_yaml(abs_cfg_path))
As you can see, this PEP configuration file uses a $TXIMPORTDATA
environment variable to specify a file path. This is just an optional way to make this PEP work in any computing environment without being changed, so you can share your sample metadata more easily. For this vignette, we need to set the variable to the output directory where our downloaded results are stored:
Sys.setenv("TXIMPORTDATA"=file.path(abs_pep_path, "/tximportData"))
# Run some stuff we need for the vignette p=Project(abs_cfg_path)
Now, look at the sample_table
key in the configuration file. It points to the second major part of a PEP: the
sample table CSV file (r { basename(config(p)$sample_table) }
). Check out the contents of that file:
library(knitr) coldataDF = read.table(p@config$sample_table, sep=",", header=TRUE) knitr::kable(coldataDF, format = "html")
This sample table lacks the files
column required by tximeta -- but this file is sufficient, since BiocProject, or more specifically pepr, will take care of constructing the portable files
sample attribute automatically via sample_modifiers.derive
, where the config file above specifies the files
attribute and its path.
Now we can load the file with BiocProject... but first, a short detour
Before we jump into using BiocProject
, let's take a minute to demonstrate how using the PEP helps us out here. Let's read in our PEP using the the generic Project
function from pepr
:
p=Project(abs_cfg_path)
We now have our PEP project read in, and we can see what is found in the sample table:
sampleTable(p)
See how our sample table has now been automatically updated with the files
attribute? That is the magic of the PEP sample modifiers. It's that simple. Now, let's move on to demonstrate what BiocProject
adds.
If you look again at our configuration file above, you'll notice the biconductor
section in the configuration file, which defines a function name and R script. These specify the BiocProject data processing function, which in this case, is simply a tximeta
call that uses the PEP-managed processed sample table its input. Here's what that function looks like:
source(file.path(abs_pep_path, "readTximeta.R")) get(config(p)$bioconductor$readFunName)
We have everything we need: a salmon output file, a PEP that specifies a sample table and provides the files
column, and a function that uses tximeta
to create the final SummarizedExperiment
object. Now, we can call the BiocProject
function:
require(tximeta) bp = BiocProject(abs_cfg_path)
The output of BiocProject
function, the bp
object in our case, is magical. In one object, it supports the functionality of SummarizedExperiment
, tximeta
, and pepr
. Observe:
First, it is a RangedSummarizedExperiment
, so it supports all methods defined in SummarizedExperiment
:
suppressPackageStartupMessages(library(SummarizedExperiment)) colData(bp) assayNames(bp) rowRanges(bp)
Naturally, we can use tximeta methods:
retrieveDb(bp)
But wait, there's more! The PEP
metadata information has been attached to the metadata as well. Let's extract the Project
object from the result with getProject
method:
getProject(bp)
You can use the pepr
API for any R-based PEP processing tools:
sampleTable(bp) config(bp)
If you format your project metadata according to the PEP specification, it will be ready to use with tximeta and the resulting object will include project-wide metadata and expose pepr API for any PEP-compatible R packages for downstream analysis.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.