knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
{scdrake}
is a scalable and reproducible pipeline for secondary analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and spot-based spatial transcriptomics data (SRT).
{scdrake}
is an R package built on top of the {drake}
package, a Make-like pipeline
toolkit for R language.
The main features of the {scdrake}
pipeline are:
SingleCellExperiment
object.SingleCellExperiment
object, and tissue positions file as in Space ranger.{drake}
, the pipeline is highly efficient, scalable and reproducible, and also extendable.{drake}
cache.For whom is {scdrake}
purposed? It is primarily intended for tech-savvy users (bioinformaticians),
who pass on the results (reports, images) to non-technical persons (biologists).
At the same time, bioinformaticians can quickly react to biologists’ needs by changing the parameters of the pipeline,
which then efficiently skips already finished parts. This dialogue between the biologist and the bioinformatician is
indispensable during scRNA-seq data analysis. {scdrake}
ensures that this communication is performed in an effective
and reproducible manner.
The pipeline structure along with diagrams
and links to outputs is described in vignette("pipeline_overview")
(link).
If you use {scdrake}
in your research, please, consider citing
Kubovciak J, Kolar M, Novotny J (2023). “Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.” Bioinformatics Advances, 3(1). doi:10.1093/bioadv/vbad089.
Huge thanks go to the authors of the
Orchestrating Single-Cell Analysis with Bioconductor book on whose
methods and recommendations is {scdrake}
largely based.
A Docker image based on the official Bioconductor image
(version r BIOC_VERSION
) is available. This is the most handy and reproducible way how to use
{scdrake}
as all the dependencies are already installed and their versions are fixed.
In addition, the parent Bioconductor image comes bundled with RStudio Server.
The complete guide to the usage of {scdrake}
's Docker image can be found in the
Docker vignette.
We strongly recommend to go through even if you are an experienced Docker user.
Below you can find just the basic command to download the image and to run a detached container with RStudio in Docker or
to run {scdrake}
in Singularity.
You can also run the image in SingularityCE (without RStudio) -
see the Singularity section in the Docker vignette above.
If the image is already downloaded in the local Docker storage, you can use singularity pull docker-daemon:<image>
You can pull the Docker image with the latest stable {scdrake}
version using
out <- scdrake::wrap_code(c( glue::glue("docker pull {DOCKER_IMAGE_STABLE}"), glue::glue("singularity pull docker:{DOCKER_IMAGE_STABLE}") ))
r paste(knitr::knit(text = out), collapse = "\n")
or list available versions in our Docker Hub repository.
For the latest development version use
out <- scdrake::wrap_code(c( glue::glue("docker pull {DOCKER_IMAGE_LATEST}"), glue::glue("singularity pull docker:{DOCKER_IMAGE_LATEST}") ))
r paste(knitr::knit(text = out), collapse = "\n")
Note for Mac users with M1/M2 chipsets: until version 1.5.0 (inclusive), arm64
images are available.
docker pull jirinovo/scdrake:1.5.0-bioc3.15-arm64
For the most common cases of host machines: Linux running Docker Engine, and Windows or MacOS running Docker Desktop.
First make a shared directory that will be mounted to the container:
mkdir ~/scdrake_projects
cd ~/scdrake_projects
And run the image that will expose RStudio Server on port 8787 on your host:
out_docker_run_rstudio <- scdrake::format_shell_command(c( "docker run -d", "-v $(pwd):/home/rstudio/scdrake_projects", "-p 8787:8787", "-e USERID=$(id -u)", "-e GROUPID=$(id -g)", "-e PASSWORD=1234", DOCKER_IMAGE_STABLE ))
r knitr::knit(text = out_docker_run_rstudio)
For Singularity, also make shared directories and execute the container ("run and forget"):
mkdir -p ~/scdrake_singularity cd ~/scdrake_singularity mkdir -p home/${USER} scdrake_projects singularity exec \ -e \ --no-home \ --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \ --pwd "/home/${USER}/scdrake_projects" \ path/to/scdrake_image.sif \ scdrake <args> <command>
{scdrake}
manually (not recommended)Click for details
$ brew install libxml2 imagemagick@6 harfbuzz fribidi libgit2 geos pandoc
See https://cloud.r-project.org/
From now on, all commands are for R.
{renv}
{renv}
is an R package for management of local R libraries. It is intended to be used
on a per-project basis, i.e. each project should use its own library of R packages.
install.packages("renv")
{renv}
librarySwitch to directory where you will analyze data and initialize a new {renv}
library:
renv::consent(TRUE) renv::init()
Now exit and run again R. You should see a message that renv library has been activated.
renv::install("BiocManager")
BiocManager::install(version = "3.15")
{scdrake}
dependencies from lockfile{renv}
also allows to export the current installed versions of R packages (and other things) into a lockfile.
Such lockfile is available for {scdrake}
and you can use it to install all dependencies by
out <- c( "\n", "```r", "## -- This is a lockfile for the latest stable version of scdrake.", glue::glue('download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/{LATEST_STABLE_VERSION}/renv.lock")'), "## -- You can increase the number of CPU cores to speed up the installation.", "options(Ncpus = 2)", 'renv::restore(lockfile = "renv.lock", repos = BiocManager::repositories())', "```", "\n" )
r paste(knitr::knit(text = out), collapse = "\n")
For the lockfile for the latest development version use
download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/main/renv.lock")
{scdrake}
packageNow we can finally install the {scdrake}
package, but using a non-standard approach - without its dependencies
(which are already installed from the lockfile).
out <- c( "\n", "```r", "remotes::install_github(", glue::glue(' "bioinfocz/scdrake@{LATEST_STABLE_VERSION}",', .trim = FALSE), " dependencies = FALSE, upgrade = FALSE,", " keep_source = TRUE, build_vignettes = TRUE,", " repos = BiocManager::repositories()", ")", "```", "\n" )
r paste(knitr::knit(text = out), collapse = "\n")
For the latest development version use "bioinfocz/scdrake"
.
Optionally, you can install {scdrake}
's CLI scripts with
scdrake::install_cli()
CLI should be now accessible as a scdrake
command. By default, the CLI is installed into ~/.local/bin
,
which is usually present in the PATH
environment variable. In case it isn't, just add to your
~/.bashrc
: export PATH="${HOME}/.local/bin:${PATH}"
Every time you will be using the CLI make sure your current working directory is inside an {renv}
project.
You can read the reasons below.
Show details
You might notice that a per-project {renv}
library and an installed CLI are "disconnected" and if you install
{scdrake}
and its CLI within multiple projects ({renv}
libraries), then the CLI scripts in ~/.local/bin
will
be overwritten each time. But when you run the scdrake
command inside an {renv}
project, the renv
directory is
automatically detected and the {renv}
library is activated by renv::load()
, so the proper, locally installed
{scdrake}
package is then used.
Also, there is a built-in guard: the version of the CLI must match the version of the bundled CLI scripts inside the
installed {scdrake}
package. Anyway, we think changes in the CLI won't be very frequent, so this shouldn't be a
problem most of the time.
TIP: To save time and space, you can symlink the
renv/library
directory to multiple{scdrake}
projects.
First run the scdrake
image in Docker or Singularity - see the
Docker vignette
Then you can go through the Get Started vignette
See https://bioinfocz.github.io/scdrake for a documentation website of the latest stable version
(r LATEST_STABLE_VERSION
) where links to vignettes below become real :-)
See https://bioinfocz.github.io/scdrake/dev for a documentation website of the current development version.
We encourage all users to read basics of the {drake}
package.
While it is not necessary to know all {drake}
internals to successfully run the {scdrake}
pipeline,
its knowledge is a plus. You can read the minimum basics in vignette("drake_basics")
.
Also, the prior knowledge of Bioconductor and its classes (especially the SingleCellExperiment) is considerable.
Below is the citation output from using citation("scdrake")
in R. Please
run this yourself to check for any updates on how to cite scdrake.
print(citation("scdrake"), bibtex = TRUE)
To cite package ‘scdrake’ in publications use: Jiri Novotny and Jan Kubovciak (2021). scdrake: A Pipeline For 10x Chromium Single-Cell RNA-seq Data Analysis. https://github.com/bioinfocz/scdrake, https://bioinfocz.github.io/scdrake. A BibTeX entry for LaTeX users is @Manual{, title = {scdrake: A Pipeline For 10x Chromium Single-Cell RNA-seq Data Analysis}, author = {Jiri Novotny and Jan Kubovciak}, year = {2021}, note = {https://github.com/bioinfocz/scdrake, https://bioinfocz.github.io/scdrake}, }
Please note that the {scdrake}
was only made possible thanks to many other R and bioinformatics software authors,
which are cited either in the vignettes and/or the paper(s) describing this package.
In case of any problems or suggestions, please, open a new issue. We will be happy to answer your questions, integrate new ideas, or resolve any problems :blush:
You can also use GitHub Discussions, mainly for topics not related to development (bugs, feature requests etc.), but if you need e.g. a general help.
If you want to contribute to {scdrake}
, read the contribution guide, please.
All pull requests are welcome! :slightly_smiling_face:
Please note that the {scdrake}
project is released with a
Contributor Code of Conduct.
By contributing to this project, you agree to abide by its terms.
This work was supported by ELIXIR CZ research infrastructure project (MEYS Grant No: LM2018131 and LM2023055) including access to computing and storage facilities.
{scdrake}
Many things are used by {scdrake}
, but these are really worth mentioning:
{usethis}
, {remotes}
, and {rcmdcheck}
.
Customized to use Bioconductor's docker containers.{pkgdown}
.{styler}
.{devtools}
and {roxygen2}
.This package was developed using {biocthis}
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.