processingQueue: Efficiently processing 'Chromatograms' objects.

processingQueueR Documentation

Efficiently processing Chromatograms objects.

Description

The processingQueue of a Chromatograms object is a list of processing steps (i.e., functions) that are stored within the object and applied only when needed. This design allows data to be processed in a single step, which is particularly useful for larger datasets. The processing queue enables functions to be applied in a chunk-wise manner, facilitating parallel processing and reducing memory demand.

Since the peaks data can be quite large, a processing queue is used to ensure efficiency. Generally, the processing queue is applied either temporarily when calling peaksData() or permanently when calling applyProcessing(). As explained below the processing efficiency can be further improved by enabling chunk-wise processing.

Usage

## S4 method for signature 'Chromatograms'
applyProcessing(
  object,
  f = processingChunkFactor(object),
  BPPARAM = bpparam(),
  ...
)

## S4 method for signature 'Chromatograms'
addProcessing(object, FUN, ...)

## S4 method for signature 'Chromatograms'
processingChunkSize(object, ...)

## S4 replacement method for signature 'Chromatograms'
processingChunkSize(object) <- value

## S4 method for signature 'Chromatograms'
processingChunkFactor(object, chunkSize = processingChunkSize(object), ...)

Arguments

object

A Chromatograms object.

f

factor defining the grouping to split the Chromatograms object.

BPPARAM

Parallel setup configuration. See BiocParallel::bpparam() for more information.

...

Additional arguments passed to the methods.

FUN

For addProcessing(), a function to be added to the Chromatograms object's processing queue.

value

integer(1) defining the chunk size.

chunkSize

integer(1) for processingChunkFactor defining the chunk size. The default is the value stored in the Chromatograms object's processingChunkSize slot.

Value

processingChunkSize() returns the currently defined processing chunk size (or Inf if it is not defined). processingChunkFactor() returns a factor defining the chunks into which object will be split for (parallel) chunk-wise processing or a factor of length 0 if no splitting is defined.

Apply Processing

The applyProcessing() function applies the processing queue to the backend and returns the updated Chromatograms object. The processing queue is a list of processing steps applied to the chromatograms data. Each element in the list is a function that processes the chromatograms data. To apply processing to the peaks data, the backend must be set to a non-read-only backend using the setBackend() function.

Parallel and Chunk-wise Processing of Chromatograms

Many operations on Chromatograms objects, especially those involving the actual peaks data (see peaksData), support chunk-wise processing. This involves splitting the Chromatograms into smaller parts (chunks) that are processed iteratively. This enables parallel processing by data chunk and reduces memory demand since only the peak data of the currently processed subset is loaded into memory. Chunk-wise processing, which is disabled by default, can be enabled by setting the processing chunk size of a Chromatograms object using the processingChunkSize() function to a value smaller than the length of the Chromatograms object. For example, setting processingChunkSize(chr) <- 1000 will cause any data manipulation operation on chr, such as filterPeaksData(), to be performed in parallel for sets of 1000 chromatograms in each iteration.

Chunk-wise processing is particularly useful for Chromatograms objects using an on-disk backend or for very large experiments. For small datasets or Chromatograms using an in-memory backend, direct processing might be more efficient. Setting the chunk size to Inf will disable chunk-wise processing.

Some backends may prefer a specific type of splitting and chunk-wise processing. For example, the ChromBackendMzR backend needs to load MS data from the original (mzML) files, so chunk-wise processing on a per-file basis is ideal. The backendParallelFactor() function for ChromBackend allows backends to suggest a preferred data chunking by returning a factor defining the respective data chunks. The ChromBackendMzR returns a factor based on the dataOrigin chromatograms variable. A factor of length 0 is returned if no particular preferred splitting is needed. The suggested chunk definition will be used if no finite processingChunkSize() is defined. Setting the processingChunkSize overrides backendParallelFactor.

Functions to configure parallel or chunk-wise processing:

  • processingChunkSize(): Gets or sets the size of the chunks for parallel or chunk-wise processing of a Chromatograms object. With a value of Inf (the default), no chunk-wise processing will be performed.

  • processingChunkFactor(): Returns a factor defining the chunks into which a Chromatograms object will be split for chunk-wise (parallel) processing. A factor of length 0 indicates that no chunk-wise processing will be performed.

Note

Some backends might not support parallel processing. For these, the backendBpparam() function will always return a SerialParam() regardless of how parallel processing was defined.

Author(s)

Johannes Rainer, Philippine Louail


rformassspectrometry/Chromatograms documentation built on Feb. 22, 2025, 11:28 a.m.