clusterRun: Submit command-line tools to cluster

View source: R/utilities.R

clusterRunR Documentation

Submit command-line tools to cluster

Description

Submits non-R command-line software to queueing/scheduling systems of compute clusters using run specifications defined by functions similar to runCommandline. clusterRun can be used with most queueing systems since it is based on utilities from the batchtools package which supports the use of template files (*.tmpl) for defining the run parameters of the different schedulers. The path to the *.tmpl file needs to be specified in a conf file provided under the conffile argument.

Usage

clusterRun(args, 
            FUN = runCommandline, 
            more.args = list(args = args, make_bam = TRUE), 
            conffile = ".batchtools.conf.R", 
            template = "batchtools.slurm.tmpl", 
            Njobs, 
            runid = "01", 
            resourceList)

Arguments

args

Object of class SYSargs or SYSargs2.

FUN

Accepts functions such as runCommandline(args, ...) where the args argument is mandatory and needs to be of class SYSargs or SYSargs2.

more.args

Object of class list, which provides the arguments that control the FUN function.

conffile

Path to conf file (default location ./.batchtools.conf.R). This file contains in its simplest form just one command, such as this line for the Slurm scheduler: cluster.functions <- makeClusterFunctionsSlurm(template="batchtools.slurm.tmpl"). For more detailed information visit this page: https://mllg.github.io/batchtools/index.html

template

The template files for a specific queueing/scheduling systems can be downloaded from here: https://github.com/mllg/batchtools/tree/master/inst/templates. Slurm, PBS/Torque, and Sun Grid Engine (SGE) templates are provided.

Njobs

Interger defining the number of cluster jobs. For instance, if args contains 18 command-line jobs and Njobs=9, then the function will distribute them accross 9 cluster jobs each running 2 command-line jobs. To increase the number of CPU cores used by each process, one can do this under the corresonding argument of the command-line tool, e.g. -p argument for Tophat.

runid

Run identifier used for log file to track system call commands. Default is "01".

resourceList

List for reserving for each cluster job sufficient computing resources including memory (Megabyte), number of nodes, CPU cores, walltime (minutes), etc. For more details, one can consult the template file for each queueing/scheduling system.

Value

Object of class Registry, as well as files and directories created by the executed command-line tools.

Author(s)

Daniela Cassol and Thomas Girke

References

For more details on batchtools, please consult the following page: https://github.com/mllg/batchtools/

See Also

clusterRun replaces the older functions getQsubargs and qsubRun.

Examples

#########################################
## Examples with \code{SYSargs} object ##
#########################################
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers. 
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, FUN = runCommandline, 
                    more.args = list(args = args, make_bam = TRUE), 
                    conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl", 
                    Njobs=18, runid="01", 
                    resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)
file.exists(outpaths(args))

## End(Not run)

##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF)

## Not run: 
## Execute SYSargs2 on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers.  
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) 
reg <- clusterRun(WF, FUN = runCommandline, 
                    more.args = list(args = WF, make_bam = TRUE),
                    conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl",
                    Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)

## Updates the path in the object \code{output(WF)}
WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam"))

## Alignment stats
read_statsDF <- alignStats(WF) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, 
                quote=FALSE, sep="\t")

## End(Not run)

tgirke/systemPipeR documentation built on Sept. 24, 2024, 9:48 a.m.