out_docker_pull_stable <- scdrake::format_shell_command(glue::glue("docker pull {DOCKER_IMAGE_STABLE}")) out_docker_pull_latest <- scdrake::format_shell_command(glue::glue("docker pull {DOCKER_IMAGE_LATEST}")) out_singularity_pull_stable <- scdrake::format_shell_command(glue::glue("singularity pull docker:{DOCKER_IMAGE_STABLE}"))
Docker (wiki) is a software for running isolated containers which are based on images. You can think of them as small, isolated operating systems.
The Bioconductor's page for Docker provides itself a nice introduction to
Docker and its usage for R, and we recommend to read it through.
In this vignette, we will mainly give practical examples related to {scdrake}
.
You can also run the image in SingularityCE (without RStudio) - see Singularity below.
On a Linux host, there can be different Docker distributions installed:
{scdrake}
's ease of use is file sharing (will be discussed later).If you are using a Linux host to run the {scdrake}
image, we recommend to stick with Docker Engine
(see the official FAQ on DD and Linux).
See here how to quickly switch to DE if you have installed DD.
Otherwise in Docker Desktop:
-u rstudio
parameter or use -u root
. For more details see the note in
Problems with filesystem permissions below.We recommend to use WSL2 and its Ubuntu distribution for Docker Desktop (guide) as this is successfully tested by us.
A quick summary of the more detailed steps below. Note that this is not covering a special case of a Linux host using Docker Desktop.
Click to show the cheatsheet
scdrake
imageLatest stable version:
r knitr::knit(text = out_docker_pull_stable)
Latest development version:
r knitr::knit(text = out_docker_pull_latest)
mkdir ~/scdrake_projects
cd ~/scdrake_projects
For Linux (using Docker Engine), Mac and Windows.
out_docker_run_rstudio <- scdrake::format_shell_command(c( "docker run -d", "-v $(pwd):/home/rstudio/scdrake_projects", "-p 8787:8787", "-e USERID=$(id -u)", "-e GROUPID=$(id -g)", "-e PASSWORD=1234", DOCKER_IMAGE_STABLE ))
r knitr::knit(text = out_docker_run_rstudio)
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d47b4d265052 scdrake:1.4.0-bioc3.15 "/init" 24 hours ago Up 24 hours 0.0.0.0:8787->8787/tcp, :::8787->8787/tcp condescending_payne
scdrake
through its CLIdocker exec -it -u rstudio -w /home/rstudio/scdrake_projects <CONTAINER ID or NAME> \ scdrake -h
A Docker image based on the official Bioconductor image
(version r BIOC_VERSION
) is available. This is the most handy and reproducible way how to use
{scdrake}
as all the dependencies are already installed and their version is fixed.
In addition, the parent Bioconductor image comes bundled with RStudio Server.
You can pull the Docker image with the latest stable {scdrake}
version using
r knitr::knit(text = out_docker_pull_stable)
or list available versions in our Docker Hub repository.
For the latest development version use
r knitr::knit(text = out_docker_pull_latest)
The {scdrake}
's image is based on the Bioconductor's image
which is in turn based on the Rocker Project's
image (rocker/rstudio
). Thanks to it, the {scdrake}
's image comes bundled with RStudio Server.
Docker allows to mount local (host) directories to containers. We recommend to create a local directory in
which individual {scdrake}
projects (and other files, if needed) will lie. This way you won't lose data when
a container is destroyed. Let's create such shared directory in your home directory and switch to it:
mkdir ~/scdrake_projects
cd ~/scdrake_projects
Important: not working on Linux hosts using Docker Desktop.
RStudio Server is a comfortable IDE for R. You can run the image including an RStudio instance using:
r knitr::knit(text = out_docker_run_rstudio)
Let's decompose the command above:
docker run
: run image as a container.-d
: run in detached ("background") mode.-v $(pwd):/home/rstudio/scdrake_projects
: mount the current working directory on the host machine as
/home/rstudio/scdrake_projects
inside the container.-p 8787:8787
: expose port 8787
in container as port 8787
on localhost.-e USERID=$(id -u) -e GROUPID=$(id -g)
: change ownership of /home/rstudio
in the container to the ID and group ID
of the current host user. This is important as it sustains the correct file ownership, i.e. when you write to
/home/rstudio/scdrake_projects
from within the container, you actually write to the host filesystem and you do so
as the proper user/group.-e PASSWORD=1234
: password for RStudio Server (and the username is rstudio
).jirinovo/scdrake:<tag>
: repository and tag of the image to be run.The -e VAR=VALUE
arguments are used to set environment variables inside the container. You can check those for the
parent rocker/rstudio
image here.
An image can have a default command which is executed on container run. For our image, it's a script /init
which
starts RStudio Server.
Now you can check that the container is running:
docker ps
You should see something as
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d47b4d265052 scdrake:1.4.0-bioc3.15 "/init" 24 hours ago Up 24 hours 0.0.0.0:8787->8787/tcp, :::8787->8787/tcp condescending_payne
Values in the CONTAINER ID
and NAMES
columns can be then used to reference the running container.
You can see that we have exposed port 8787
. What it is useful for? Well, it allows us to connect to RStudio Server
instance, which is running on port 8787 inside the container. Now you can just open your browser, navigate to
localhost:8787
and login as rstudio
with password 1234
.
TIP: if you are using a remote server which you can SSH into and it allows SSH tunneling, then you can forward the exposed port to your host:
ssh -NL 8787:localhost:8787 user@server
Now you can start using {scdrake}
from the RStudio, or read below for alternative ways.
If you just want to run the image detached without RStudio Server, use
out <- scdrake::format_shell_command(c( "docker run -d", "-u rstudio", "-v $(pwd):/home/rstudio/scdrake_projects", DOCKER_IMAGE_STABLE, "sleep infinity" ))
r knitr::knit(text = out)
Note for Docker Desktop on a Linux host: omit the -u rstudio
(or use -u root
) to run the image as root.
This will mount /home/rstudio/scdrake_projects
as UID 0 in the container, but UID of the DD user will be used in the
host directory.
Running the image in detached mode gives us chance e.g. to find out more details about what's wrong in a failed pipeline
run, or to work interactively in RStudio. However, we can also run the image in a "run and forget" mode where a
container is started, {scdrake}
is run, and finally the container is stopped and removed.
out <- scdrake::format_shell_command(c( "docker run -it --rm", "-v $(pwd):/home/rstudio/scdrake_projects", DOCKER_IMAGE_STABLE, "scdrake ..." ))
r knitr::knit(text = out)
Note for Linux hosts using Docker Engine: the active user in the container will be root and created files will belong
to him. If your user ID (UID) on the host is 1000, you can use the -u rstudio
, and created files will then belong to you.
If your UID is different, then either run the image including RStudio or refer to
Problems with filesystem permissions below.
You can start bash
or R
process inside the container and attach to it using
docker exec -it -u rstudio <CONTAINER ID or NAME> bash docker exec -it -u rstudio <CONTAINER ID or NAME> R
docker exec
: executes a command in a running container.-i
: interactive.-t
: allocate a pseudo-TTY.Note for Docker Desktop on a Linux host: omit the -u rstudio
(or use -u root
).
In the first example, when we run the image including RStudio Server, we specified
-e USERID=$(id -u) -e GROUPID=$(id -g)
, and so ownership of /home/rstudio
was changed to match the host user*.
Then when you write to the shared directory /home/rstudio/scdrake_projects
, you are basically doing that as the host user.
* This is actually not a feature of Docker, but the rocker/rstudio
image which uses those environment
variables to run the /rocker_scripts/init_userconf.sh
which manages the ownership.
By default, rstudio
user in the container has ID of 1000 which is commonly used for the default user in most
Linux distributions. You can check it yourself on your host by executing id -u
. If it is 1000, then you can safely
skip the instructions below.
Note for Linux hosts using Docker Desktop: in this case, Docker manages the file ownership as described
here. For you it means the root user
(UID 0) inside the container will write to the shared directory using UID of the Docker Desktop user, so you don't have
to care about the -u rstudio
parameter in docker exec
command.
So when you don't want to run RStudio Server and your host user ID is not 1000, you have to take care of ownership by yourself. There are several ways how to accomplish that. You can also read about this problem here or here.
The rocker/rstudio
parent image already comes with a script which can add or modify user information
(it is actually run when you start the container including RStudio Server).
docker exec \ -e USERID=$(id -u) -e GROUPID=$(id -g) \ <CONTAINER ID or NAME> \ bash /rocker_scripts/init_userconf.sh
Note that this just changes ID and group of the rstudio
user, so later don't forget to run commands as this user,
i.e. docker exec -u rstudio ...
If you have root privileges on the host, you can just change ownership each time you write to the shared directory from the container:
sudo chown -R $(id -u):$(id -g) scdrake_projects
{scdrake}
commands through its command line interface (CLI)We now assume that the {scdrake}
container is running and the shared directory is mounted in
/home/rstudio/scdrake_projects
. The {scdrake}
commands can be easily issued through its CLI
(see vignette("scdrake_cli")
):
docker exec -it -u rstudio <CONTAINER ID or NAME> \ scdrake -h
The usual workflow can be similar to the following:
docker exec -it -u rstudio -w /home/rstudio/scdrake_projects <CONTAINER ID or NAME> \ scdrake -d my_first_project init-project
Modify the configs on your host in my_first_project/config/
Run the pipeline:
docker exec -it -u rstudio -w /home/rstudio/scdrake_projects/my_first_project <CONTAINER ID or NAME> \ scdrake --pipeline-type single_sample run
Note for Docker Desktop on a Linux host: omit the -u rstudio
(or use -u root
).
To reduce typing/copy-pasting a bit, you can create a command alias in your shell, here for bash
:
alias scdrake_docker="docker exec -it -u rstudio <CONTAINER ID or NAME> scdrake"
Note for Docker Desktop on a Linux host: omit the -u rstudio
(or use -u root
).
And then use this alias as e.g.
scdrake_docker -d /home/rstudio/scdrake_projects/my_first_project --pipeline-type single_sample run
Notice that we are using the -d
parameter as the alias doesn't contain the -w
parameter which instructs Docker
to execute the command in a working directory.
Finally, you can put the alias permanently inside your ~/.bashrc
file. Just don't forget to change
<CONTAINER ID or NAME>
when you start a new container :)
Alternatively, you can utilize the "run and forget" way described above:
out <- scdrake::format_shell_command(c( "alias scdrake_docker=docker run -it --rm", "-v $(realpath ~/scdrake_projects):/home/rstudio/scdrake_projects", DOCKER_IMAGE_STABLE, "scdrake" ))
r knitr::knit(text = out)
Singularity is an alternative container runner that does not require root access, so it can be used on e.g. HPCs.
You will probably use Singularity in case you don't have root permissions on the host machine. It can be simply installed to the user space e.g. via the conda virtual environment/package manager as the conda-forge/singularity package.
Docker images can be simply pulled from the Docker Hub similar to docker pull
:
r knitr::knit(text = out_singularity_pull_stable)
The only difference is the usage of docker:
prefix so Singularity knows where to look for the image.
If the image is already present in the local Docker storage, you can use docker-daemon:
instead.
After pulling, Singularity needs to convert the image to the SIF format, and that can take quite a time. By default, the SIF file is saved in the current working directory.
Here are some key differences compared to Docker:
HOME
to the container. While this might be practical in some cases,
we rather recommend to mount an empty/different directory to /home/<user>
in the container.First we will create a directory structure. These directories will be mounted to the container. Note that we also create a fake home directory - that is because some R packages need to use a cache directory, which is located there.
mkdir -p ~/scdrake_singularity cd ~/scdrake_singularity mkdir -p home/${USER} scdrake_projects/pbmc1k
Now we can issue a {scdrake}
commands through its CLI:
singularity exec \ -e \ --no-home \ --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \ --pwd "/home/${USER}/scdrake_projects" \ path/to/scdrake_image.sif \ scdrake <args> <command>
For example, to initialize a new project:
singularity exec \ -e \ --no-home \ --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \ --pwd "/home/${USER}/scdrake_projects/pbmc1k" \ path/to/scdrake_image.sif \ scdrake --download-example-data init-project
And to run the pipeline using that project:
singularity exec \ -e \ --no-home \ --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \ --pwd "/home/${USER}/scdrake_projects/pbmc1k" \ path/to/scdrake_image.sif \ scdrake --pipeline-type single_sample run
Note that you can also start a bash or R session and use {scdrake}
within that, e.g.
singularity exec \ -e \ --no-home \ --bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \ --pwd "/home/${USER}/scdrake_projects/pbmc1k" \ path/to/scdrake_image.sif \ R
and then in R
scdrake::run_single_sample_r()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.