```{css, echo=FALSE} pre code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; }
<!-- - Compile from command-line Rscript -e "rmarkdown::render('systemPipeR_workflow.Rmd', c('BiocStyle::html_document'), clean=F); knitr::knit('systemPipeR_workflow.Rmd', tangle=TRUE)"; Rscript ../md2jekyll.R systemPipeR.knit.md 2; Rscript -e "rmarkdown::render('systemPipeR_workflow.Rmd', c('BiocStyle::pdf_document'))" --> ```r BiocStyle::markdown() options(width=80, max.print=1000) knitr::opts_chunk$set( eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")), cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), tidy.opts=list(width.cutoff=80), tidy=TRUE)
suppressPackageStartupMessages({ library(systemPipeR) })
Note: the most recent version of this tutorial can be found here.
Note: if you use systemPipeR
in published research, please cite:
Backman, T.W.H and Girke, T. (2016). systemPipeR
: NGS Workflow and Report Generation Environment. BMC Bioinformatics, 17: 388. 10.1186/s12859-016-1241-0.
The intended way of running sytemPipeR
workflows is via *.Rmd
files, which
can be executed either line-wise in interactive mode or with a single command from
R or the command-line. This way comprehensive and reproducible analysis reports
can be generated in PDF or HTML format in a fully automated manner by making use
of the highly functional reporting utilities available for R.
The following shows how to execute a workflow (e.g., systemPipeRNAseq.Rmd)
from the command-line.
```{bash command-line, eval=FALSE} Rscript -e "rmarkdown::render('systemPipeRNAseq.Rmd')"
Templates for setting up custom project reports are provided as _`*.Rmd`_ files by the helper package _`systemPipeRdata`_ and in the vignettes subdirectory of _`systemPipeR`_. The corresponding HTML of these report templates are available here: [_`systemPipeRNAseq`_](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.html), [_`systemPipeRIBOseq`_](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRIBOseq.html), [_`systemPipeChIPseq`_](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeChIPseq.html) and [_`systemPipeVARseq`_](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.html). To work with _`*.Rmd`_ files efficiently, basic knowledge of [_`knitr`_](http://yihui.name/knitr/) and [_`Latex`_](http://www.latex-project.org/) or [_`R Markdown v2`_](http://rmarkdown.rstudio.com/) is required. ## Directory Structure The working environment of the sample data loaded in the previous step contains the following pre-configured directory structure. Directory names are indicated in <span style="color:grey">***green***</span>. Users can change this structure as needed, but need to adjust the code in their workflows accordingly. * <span style="color:green">_**workflow/**_</span> (*e.g.* *rnaseq/*) + This is the root directory of the R session running the workflow. + Run script ( *\*.Rmd*) and sample annotation (*targets.txt*) files are located here. + Note, this directory can have any name (*e.g.* <span style="color:green">_**rnaseq**_</span>, <span style="color:green">_**varseq**_</span>). Changing its name does not require any modifications in the run script(s). + **Important subdirectories**: + <span style="color:green">_**param/**_</span> + Stores non-CWL parameter files such as: *\*.param*, *\*.tmpl* and *\*.run.sh*. These files are only required for backwards compatibility to run old workflows using the previous custom command-line interface. + <span style="color:green">_**param/cwl/**_</span>: This subdirectory stores all the CWL parameter files. To organize workflows, each can have its own subdirectory, where all `CWL param` and `input.yml` files need to be in the same subdirectory. + <span style="color:green">_**data/**_ </span> + FASTQ files + FASTA file of reference (*e.g.* reference genome) + Annotation files + etc. + <span style="color:green">_**results/**_</span> + Analysis results are usually written to this directory, including: alignment, variant and peak files (BAM, VCF, BED); tabular result files; and image/plot files + Note, the user has the option to organize results files for a given sample and analysis step in a separate subdirectory. The following parameter files are included in each workflow template: 1. *`targets.txt`*: initial one provided by user; downstream *`targets_*.txt`* files are generated automatically 2. *`*.param/cwl`*: defines parameter for input/output file operations, *e.g.*: + *`hisat2-se/hisat2-mapping-se.cwl`* + *`hisat2-se/hisat2-mapping-se.yml`* 3. *`*_run.sh`*: optional bash scripts 4. Configuration files for computer cluster environments (skip on single machines): + *`.batchtools.conf.R`*: defines the type of scheduler for *`batchtools`* pointing to template file of cluster, and located in user's home directory + *`*.tmpl`*: specifies parameters of scheduler used by a system, *e.g.* Torque, SGE, Slurm, etc. # RNA-Seq Workflow This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for _`RNA-Seq`_ data. **The full workflow can be found here**: [HTML](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.html), [.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.Rmd), and [.R](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.R). ## Loading package and workflow template Load the _`RNA-Seq`_ sample workflow into your current working directory. ```r library(systemPipeRdata) genWorkenvir(workflow="rnaseq") setwd("rnaseq")
Next, run the chosen sample workflow systemPipeRNAseq
(.Rmd) by executing from the command-line make -B
within the rnaseq
directory. Alternatively, one can run the code from the provided *.Rmd
template file from within R interactively.
Workflow includes following steps:
HISAT2
(or any other RNA-Seq aligner)This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for ChIP-Seq
data.
The full workflow can be found here: HTML, .Rmd, and .R.
Load the ChIP-Seq
sample workflow into your current working directory.
library(systemPipeRdata) genWorkenvir(workflow="chipseq") setwd("chipseq")
Next, run the chosen sample workflow systemPipeChIPseq
(.Rmd) by executing from the command-line make -B
within the chipseq
directory. Alternatively, one can run the code from the provided *.Rmd
template file from within R interactively.
Workflow includes following steps:
Bowtie2
or rsubread
MACS2
, BayesPeak
This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for VAR-Seq
data.
The full workflow can be found here: HTML, .Rmd, and .R.
Load the VAR-Seq
sample workflow into your current working directory.
library(systemPipeRdata) genWorkenvir(workflow="varseq") setwd("varseq")
Next, run the chosen sample workflow systemPipeVARseq
(.Rmd) by executing from the command-line make -B
within the varseq
directory. Alternatively, one can run the code from the provided *.Rmd
template file from within R interactively.
Workflow includes following steps:
gsnap
, bwa
VariantTools
, GATK
, BCFtools
VariantTools
and VariantAnnotation
VariantAnnotation
This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for RIBO-Seq
data.
The full workflow can be found here: HTML, .Rmd, and .R.
Load the RIBO-Seq
sample workflow into your current working directory.
library(systemPipeRdata) genWorkenvir(workflow="riboseq") setwd("riboseq")
Next, run the chosen sample workflow systemPipeRIBOseq
(.Rmd) by executing from the command-line make -B
within the ribseq
directory. Alternatively, one can run the code from the provided *.Rmd
template file from within R interactively.
Workflow includes following steps:
HISAT2
(or any other RNA-Seq aligner)sessionInfo()
This project is funded by NSF award ABI-1661152.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.