Description Usage Arguments Value Author(s) See Also Examples
Takes an index set, breaks it into batches and runs the given function on each batch
in parallel using the given cluster. See batchByIndex
for the non-parallel version.
When doing a select were the condition is a large number of ids it is not always possible to include them in a single SQL statement. This function will break the list of ids into chunks and allow the indexProcessor to deal with just a small number of ids.
1 | parBatchByIndex(allIndices, indexProcessor, reduce, cl, batchSize = 1e+05)
|
allIndices |
A vector of values that will be broken into batches and passed as an argument to the
|
indexProcessor |
A function that takes one batch if indices. It is called once for each batch, possibly in
parallel. The return value of this function is collected into a list and passed to the
|
reduce |
This function is run after all jobs have finished. It is called with a list of return values from
the The idea is that this function merges all the results together into one result. |
cl |
A SNOW cluster to run jobs on. |
batchSize |
The size of each batch. The last batch may be smaller than this value. |
The return value of the reduce
function is returned.
Kevin Horan
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ## Not run:
cl = makeCluster(2) # create a SNOW cluster
#function to run a query for each batch of indexes
job = function(indexBatch)
dbGetQuery(dbConnection, paste("SELECT weight FROM table WHERE id IN (",paste(indexBatch,collapse=","),")"))
# function to combine all the results, in this case by summing them up
reduce = function(results) sum(unlist(results))
indices = 1:10000
#run queries in parallel and then sum the results
totalWeight = parBatchByIndex(indices,job,reduce,cl, 1000)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.