clusterApply.gdsn: Apply functions over matrix margins in parallel
In zhengxwen/gdsfmt: R Interface to CoreArray Genomic Data Structure (GDS) Files

clusterApply.gdsn

R Documentation

Apply functions over matrix margins in parallel

Description

Return a vector or list of values obtained by applying a function to margins of a GDS matrix in parallel.

Usage

clusterApply.gdsn(cl, gds.fn, node.name, margin, FUN, selection=NULL,
    as.is=c("list", "none", "integer", "double", "character", "logical", "raw"),
    var.index=c("none", "relative", "absolute"), .useraw=FALSE,
    .value=NULL, .substitute=NULL, ...)

Arguments

`cl`	a cluster object, created by this package or by the package parallel
`gds.fn`	the file name of a GDS file
`node.name`	a character vector indicating GDS node path
`margin`	an integer giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns
`FUN`	the function to be applied
`selection`	a list or NULL; if a list, it is a list of logical vectors according to dimensions indicating selection; if NULL, uses all data
`as.is`	returned value: a list, an integer vector, etc
`var.index`	if `"none"`, call `FUN(x, ...)` without an index; if `"relative"` or `"absolute"`, add an argument to the user-defined function `FUN` like `FUN(index, x, ...)` where `index` in the function is an index starting from 1: `"relative"` for indexing in the selection defined by `selection`, `"absolute"` for indexing with respect to all data
`.useraw`	use R RAW storage mode if integers can be stored in a byte, to reduce memory usage
`.value`	a vector of values to be replaced in the original data array, or NULL for nothing
`.substitute`	a vector of values after replacing, or NULL for nothing; `length(.substitute)` should be one or `length(.value)`; if `length(.substitute)` = `length(.value)`, it is a mapping from `.value` to `.substitute`
`...`	optional arguments to `FUN`

Details

The algorithm of applying is optimized by blocking the computations to exploit the high-speed memory instead of disk.

Value

A vector or list of values.

Author(s)

Xiuwen Zheng

Examples

###########################################################
# prepare a GDS file

# cteate a GDS file
f <- createfn.gds("test1.gds")

(n <- add.gdsn(f, "matrix", val=matrix(1:(10*6), nrow=10)))
read.gdsn(index.gdsn(f, "matrix"))

closefn.gds(f)


# cteate the GDS file "test2.gds"
(f <- createfn.gds("test2.gds"))

X <- matrix(1:50, nrow=10)
Y <- matrix((1:50)/100, nrow=10)
Z1 <- factor(c(rep(c("ABC", "DEF", "ETD"), 3), "TTT"))
Z2 <- c(TRUE, FALSE, TRUE, FALSE, TRUE)

node.X <- add.gdsn(f, "X", X)
node.Y <- add.gdsn(f, "Y", Y)
node.Z1 <- add.gdsn(f, "Z1", Z1)
node.Z2 <- add.gdsn(f, "Z2", Z2)
f

closefn.gds(f)



###########################################################
# apply in parallel

library(parallel)

# Use option cl.core to choose an appropriate cluster size.
cl <- makeCluster(getOption("cl.cores", 2L))


# Apply functions over rows or columns of matrix

clusterApply.gdsn(cl, "test1.gds", "matrix", margin=1, FUN=function(x) x)

clusterApply.gdsn(cl, "test1.gds", "matrix", margin=2, FUN=function(x) x)

clusterApply.gdsn(cl, "test1.gds", "matrix", margin=1,
    selection = list(rep(c(TRUE, FALSE), 5), rep(c(TRUE, FALSE), 3)),
    FUN=function(x) x)

clusterApply.gdsn(cl, "test1.gds", "matrix", margin=2,
    selection = list(rep(c(TRUE, FALSE), 5), rep(c(TRUE, FALSE), 3)),
    FUN=function(x) x)



# Apply functions over rows or columns of multiple data sets

clusterApply.gdsn(cl, "test2.gds", c("X", "Y", "Z1"), margin=c(1, 1, 1),
    FUN=function(x) x)

# with variable names
clusterApply.gdsn(cl, "test2.gds", c(X="X", Y="Y", Z="Z2"), margin=c(2, 2, 1),
    FUN=function(x) x)


# stop clusters
stopCluster(cl)


# delete the temporary file
unlink(c("test1.gds", "test2.gds"), force=TRUE)

zhengxwen/gdsfmt documentation built on April 14, 2025, 2:18 a.m.