{scdrake}
configs are stored in YAML files (in the config/
directory by default), which are parsed in R to lists
and used to build {drake}
plans.
YAML (quick reference,
cheatsheets here and here)
is a human-readable format and uses indentation (spaces) for nesting of parameters.
{scdrake}
is using a concept of default and local configs. Default configs are bundled with the package and copied
during project initialization, update, and pipeline run (e.g. in run_single_sample_r()
).
Local configs are, as their name suggests, purposed to make modifications to default configs
(see the section below for how it is actually done).
Default and local configs are named *.default.yaml
and *.yaml
, respectively.
Config files are separated according to general parameters and each pipeline's main stage
(quality control, normalization, etc.), but for consistency are always read all at once
(by default if you run e.g. run_single_sample_r()
).
All paths in configs must be relative to project root, or absolute (not recommended), unless otherwise stated.
It may happen that new parameters are added to default configs. In that case, those new parameters need to be appended to local configs, while preserving the current local parameters. Calling this procedure "update" is a little bit misleading, as we actually overwrite default config by local one and append new parameters from default config to it. "Merging" should be a better word, as we merge default config with local one (and the order does matter).
Configs are merged recursively by parameter names. It is necessary to realize that YAML format are nested dictionaries (or named lists in the context of R). That is, this YAML
# Default config. FOO: 1 BAR: "baz" FOO_LIST: [1, "hello"] FOO_NAMED_LIST: FOO_2: 2 BAR_2: "zab"
is in R parsed to the list:
list(FOO = 1L, BAR = "baz", FOO_LIST = list(1L, "hello"), FOO_NAMED_LIST = list(FOO_2 = 2L, BAR_2 = "zab"))
So, for example, given that the YAML above is our default config, we want to merge it with the local config
# Local config. BAR: "zab" FOO_LIST: [4, 5, 6] FOO_NAMED_LIST: FOO_2: 3 BAR_3: 5
resulting in
# Updated local config. FOO: 1 BAR: "zab" FOO_LIST: [4, 5, 6] FOO_NAMED_LIST: FOO_2: 3 BAR_2: "zab" BAR_3: 5
FOO
is not updated.BAR
and FOO_LIST
are overwritten by the local config.FOO_NAMED_LIST
: FOO_2
is updated, BAR_3
is added, but BAR_2
is still present, although it is not defined
in the local config!To overcome the problem with FOO_NAMED_LIST
, parameters with such structure are specified as a named list wrapped
by an unnamed list:
FOO_NAMED_LIST: - FOO_2: 2 BAR_2: "zab"
Note the beginning -
creating the unnamed list and indentation of values of the named list.
This YAML is parsed in R to
list(FOO_NAMED_LIST = list(list(FOO_2 = 2L, BAR_2 = "zab")))
If we modify the example of local config above to
FOO_NAMED_LIST: - FOO_2: 3 BAR_3: 5
then the whole FOO_NAMED_LIST
parameter will be replaced during a config merge, and that is the desired behaviour.
Also, by default, a structure (parameter positions, comments) of local config is preserved during update.
However, it is possible to use the structure of a default config - see ?update_config
for more details.
A special type of parameter starting with !code
can be used to evaluate a value as R code:
EXAMPLE: !code 1:3
First, non-code parameters are loaded to a separate environment and then code parameters are evaluated in the context of this environment. This means you can use other config parameters as R variables inside code parameters:
FOO: 1 BAR: !code FOO + 1
In addition, in stage configs (e.g. 02_norm_clustering
) you can also use parameters (variables) from pipeline.yaml
and 00_main.yaml
configs:
EXAMPLE: !code glue("{PROJECT_NAME}_{INSTITUTE}")
See ?load_config
for more details.
yq
toolInternally, the yq tool (version 3) is used for merging of YAML files.
It is a command line utility whose binary needs to be downloaded. This is done automatically during initialization of
a new project, or you can do it manually through download_yq()
.
On {scdrake}
load or attach, SCDRAKE_YQ_BINARY
environment variable is read - if not set,
a value from Sys.which("yq")
is used (this function searches in PATH
environment variable).
Then scdrake_yq_binary
option is set, and is used as default value to config-updating functions (see ?update_config
).
You can also look at ?check_yq
for details on how PATH
variable is treated in terminal vs. RStudio.
scdrake_list
The base R's extracting operators ($
, [
, [[
) are very benevolent for non-existing elements in list()
for which return NULL
. This is not a desired behaviour for config variables that must have a value or explicit
NULL
. Returning NULL
when the value was actually not loaded from a config file can lead to unpredictable results.
Thus, for storing config variables, {scdrake}
is using a modified list()
called scdrake_list()
which is using strict extracting operators. It behaves like a normal list:
cfg <- scdrake::scdrake_list(list(var_1 = 1, var_2 = 2)) cfg$var_1 cfg[["var_2"]] cfg["var_1"] cfg[c("var_1", "var_2")]
But extracting non-existing elements throws error:
cfg$var_3 cfg[["var_3"]] cfg["var_3"]
For [
and [[
, this control can be turned off by check = FALSE
parameter:
cfg[["var_3", check = FALSE]] cfg["var_3", check = FALSE]
Also, in this case [
is more consistent: it returns a valid named list, unlike the normal list, which sets
NA_character_
as names of the non-existing elements.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.