R/Spectra.R

Defines functions coreSpectraVariables asDataFrame .create_spectra

Documented in asDataFrame coreSpectraVariables

#' @include hidden_aliases.R
NULL

################################################################################
##
## Spectra class, creation, data representation, export
##
################################################################################

#' @title The Spectra class to manage and access MS data
#'
#' @name Spectra
#'
#' @aliases Spectra-class
#' @aliases Spectra
#' @aliases setBackend
#' @aliases export
#'
#' @description
#'
#' The `Spectra` class encapsules spectral mass spectrometry (MS) data and
#' related metadata. The MS data is represented by a *backend* extending the
#' virual [MsBackend] class which provides the data to the `Spectra` object.
#' The `Spectra` class implements only data accessor, filtering and analysis
#' methods for the MS data and relies on its *backend* to provide the MS data.
#' This allows to change data representations of a `Spectra` object depending
#' on the user's needs and properties of the data. Different backends and
#' their properties are explained in the [MsBackend] documentation.
#'
#' Documentation on other topics and functionality of `Spectra`can be found in:
#'
#' - [spectraData()] for accessing and using MS data through `Spectra` objects.
#' - [filterMsLevel()] to subset and filter `Spectra` objects.
#' - [plotSpectra()] for visualization of `Spectra` orbjects.
#' - [processingChunkSize()] for information on parallel and chunk-wise data
#'   processing.
#' - [combineSpectra()] for merging, aggregating and splitting of `Spectra`
#'   objects.
#' - [combinePeaks()] for merging and aggregating `Spectra`'s mass peaks data.
#' - [addProcessing()] for data analysis functions.
#' - [compareSpectra()] for spectra similarity calculations.
#'
#' @param backend For `Spectra()`: [MsBackend-class] to be used as backend. See
#'     section on creation of `Spectra` objects for details. For `setBackend()`:
#'     instance of [MsBackend-class] that supports `setBackend()` (i.e. for
#'     which `supportsSetBackend()` returns `TRUE`). Such backends have a
#'     parameter `data` in their `backendInitialize()` function that support
#'     passing the full spectra data to the initialize method. See section on
#'     creation of `Spectra` objects for details.
#'     For `export()`: [MsBackend-class] to be used to export the data.
#'
#' @param BPPARAM Parallel setup configuration. See [BiocParallel::bpparam()]
#'     for more information. This is passed directly to the
#'     [backendInitialize()] method of the [MsBackend-class].
#'
#' @param f For `setBackend()`: factor defining how to split the data
#'     for parallelized copying of the spectra data to the new backend. For
#'     some backends changing this parameter can lead to errors. Defaults to
#'     [processingChunkFactor()].
#'
#' @param metadata For `Spectra()`: optional `list` with metadata information.
#'
#' @param object For `Spectra()`: an object to instantiate the `Spectra`
#'     object and initialize the with data.. See section on creation of
#'     `Spectra` objects for details. For all other methods a `Spectra` object.
#'
#' @param processingQueue For `Spectra()`: optional `list` of
#'     [ProtGenerics::ProcessingStep-class] objects.
#'
#' @param source For `Spectra()`: instance of [MsBackend-class] that can be
#'     used to import spectrum data from the provided files. See section
#'    *Creation of objects* for more details.
#'
#' @param value For `dataStorageBasePath()`: A `character` vector that defines
#'     the base directory where the data storage files can be found.
#'
#' @param ... Additional arguments.
#'
#' @section Data stored in a `Spectra` object:
#'
#' The `Spectra` object is a container for MS data that includes mass peak
#' data (*m/z* and related intensity values, also referred to as *peaks data*
#' in the context of `Spectra`) and metadata of individual spectra (so called
#' *spectra variables*). While a core set of spectra variables (the
#' `coreSpectraVariables()`) are guaranteed to be provided by a
#' `Spectra`, it is possible to add arbitrary additional spectra variables to
#' a `Spectra` object.
#'
#' The `Spectra` object is designed to contain MS data of a (large) set of mass
#' spectra. The data is organized *linearly* and can be thought of a list of
#' mass spectra, i.e. each element in the `Spectra` is one spectrum.
#'
#'
#' @section Creation of objects:
#'
#' `Spectra` classes can be created with the `Spectra()` constructor function
#' which supports the following formats:
#'
#' - parameter `object` is a `data.frame` or `DataFrame` containing the
#'   full spectrum data (spectra variables in columns as well as columns
#'   with the individual MS peak data, *m/z* and intensity). The provided
#'   `backend` (by default a [MsBackendMemory-class]) will be initialized
#'   with that data.
#'
#' - parameter `object` is a [MsBackend-class] (assumed to be already
#'   initialized).
#'
#' - parameter `object` is missing, in which case it is supposed that the data
#'   is provided by the [MsBackend-class] class passed along with the `backend`
#'   argument.
#'
#' - parameter `object` is of type `character` and is expected to be the file
#'   names(s) from which spectra should be imported. Parameter `source` allows
#'   to define a [MsBackend-class] that is able to import the data from the
#'   provided source files. The default value for `source` is [MsBackendMzR()]
#'   which allows to import spectra data from mzML, mzXML or CDF files.
#'
#' With `...` additional arguments can be passed to the backend's
#' [backendInitialize()] method. Parameter `backend` allows to specify which
#' [MsBackend-class] should be used for data representation and storage.
#'
#'
#' @section Data representation of a `Spectra`:
#'
#' The MS data which can be accessed through the `Spectra` object is
#' *represented* by its backend, which means that this backend defines how
#' and where the data is stored (e.g. in memory or on disk). The `Specrta`
#' object relies on the backend to provide the MS data whenever it needs it
#' for data processing.
#' Different backends with different properties, such as minimal memory
#' requirement or fast data access, are defined in the *Spectra* package or
#' one of the MsBackend* packages. More information on backends and their
#' properties is provided in the documentation of [MsBackend].
#'
#' On-disk backends keep only a limited amount of data in memory retrieving
#' most of the data (usually the MS peak data) upon request on-the-fly from
#' their on-disk data representations. Moving the on-disk data storage of such
#' a backend or a serialized object to a different location in the file
#' system will cause data corruption. The `dataStorageBasePath()` and
#' `dataStorageBasePath<-` functions allow in such cases (and if thebackend
#' classes support this operation), to get or change the *base*
#' path to the directory of the backend's data storage. In-memory backends
#' such as [MsBackendMemory] or [MsBackendDataFrame] keeping all MS data in
#' memory don't support, and need, this function, but for [MsBackendMzR] this
#' function can be used to update/adapt the path to the directory containing
#' the original data files. Thus, for `Spectra` objects (using this backend)
#' that were moved to another file system or computer, these functions allow to
#' adjust/adapt the base file path.
#'
#'
#' @section Changing data representation of a `Spectra`:
#'
#' The data representation, i.e. the backend of a `Spectra` object can be
#' changed with the `setBackend()` method that takes an instance of the new
#' backend as second parameter `backend`. A call to
#' `setBackend(sps, backend = MsBackendDataFrame())`
#' would for example change the backend of `sps` to the *in-memory*
#' `MsBackendDataFrame`. Changing to a backend is only supported if that
#' backend has a `data` parameter in its `backendInitialize()` method and if
#' `supportsSetBackend()` returns `TRUE` for that backend. `setBackend()` will
#' transfer the full spectra data from the originating backend as a `DataFrame`
#' to the new backend.
#'
#' Generally, it is not possible to change **to** a read-only backend such as
#' the [MsBackendMzR()] backend.
#'
#' The definition of the function is:
#' `setBackend(object, backend, ..., f = dataStorage(object),
#'     BPPARAM = bpparam())` and its parameters are:
#'
#' - `object`: the `Spectra` object.
#'
#' - `backend`: an instance of the new backend, e.g. `[MsBackendMemory()]`.
#'
#' - `f`: factor allowing to parallelize the change of the backends. By
#'   default the process of copying the spectra data from the original to the
#'   new backend is performed separately (and in parallel) for each file. Users
#'   are advised to use the default setting.
#'
#' - `...`: optional additional arguments passed to the [backendInitialize()]
#'   method of the new `backend`.
#'
#' - `BPPARAM`: setup for the parallel processing. See
#'   [BiocParallel::bpparam()] for details.
#'
#'
#' @section Exporting data from a `Spectra` object:
#'
#' Data from a `Spectra` object can be **exported** to a file with the
#' `export()` function. The actual export of the data is performed by
#' the `export` method of the [MsBackend] class defined with the mandatory
#' parameter `backend` which defines also the format in which the data
#' is exported. Note however that not all backend classes support
#' export of data. From the `MsBackend` classes in the `Spectra` package
#' currently only the `MsBackendMzR` backend supports data export (to
#' mzML/mzXML file(s)); see the help page of the [MsBackend-class] for
#' information on its arguments or the examples below or the vignette
#' for examples.
#'
#' The definition of the function is
#' `export(object, backend,  ...)` and its
#' parameters are:
#'
#' - `object`: the `Spectra` object to be exported.
#'
#' - `backend`: instance of a class extending [MsBackend] which supports export
#'   of the data (i.e. which has a defined `export` method).
#'
#' - `...`: additional parameters specific for the `MsBackend` passed with
#'   parameter `backend`.
#'
#'
#' @details
#'
#' The `Spectra` class uses by default a lazy data manipulation strategy,
#' i.e. data manipulations such as performed with `replaceIntensitiesBelow()`
#' are not applied immediately to the data, but applied on-the-fly to the
#' spectrum data once it is retrieved. This enables data manipulation
#' operations also for *read only* data representations. For some backends that
#' allow to write data back to the data storage (such as the
#' [MsBackendMemory()], [MsBackendDataFrame()] and [MsBackendHdf5Peaks()]) it
#' is possible to apply to queue with the [applyProcessing()] function (see
#' the [applyProcessing()] function for details).
#'
#' Clarifications regarding scan/acquisition numbers and indices:
#'
#' - A `spectrumId` (or `spectrumID`) is a vendor specific field in
#'   the mzML file that contains some information about the
#'   run/spectrum, e.g.: `controllerType=0 controllerNumber=1
#'   scan=5281 file=2`
#'
#' - `acquisitionNum` is a more a less sanitize spectrum id generated
#'   from the `spectrumId` field by `mzR` (see
#'   [here](https://github.com/sneumann/mzR/blob/master/src/pwiz/data/msdata/MSData.cpp#L552-L580)).
#'
#' - `scanIndex` is the `mzR` generated sequence number of the
#'   spectrum in the raw file (which doesn't have to be the same as
#'   the `acquisitionNum`)
#'
#' See also [this issue](https://github.com/lgatto/MSnbase/issues/525).
#'
#' @md
#'
#' @author Sebastian Gibb, Johannes Rainer, Laurent Gatto, Philippine Louail
#'
#' @exportClass Spectra
#'
#' @exportMethod Spectra
#'
#' @examples
#'
#' ##  --------  CREATION OF SPECTRA OBJECTS  --------
#'
#' ## Create a Spectra providing a `DataFrame` containing the spectrum data.
#'
#' spd <- DataFrame(msLevel = c(1L, 2L), rtime = c(1.1, 1.2))
#' spd$mz <- list(c(100, 103.2, 104.3, 106.5), c(45.6, 120.4, 190.2))
#' spd$intensity <- list(c(200, 400, 34.2, 17), c(12.3, 15.2, 6.8))
#'
#' data <- Spectra(spd)
#' data
#'
#' ## Create a Spectra from mzML files and use the `MsBackendMzR` on-disk
#' ## backend.
#' sciex_file <- dir(system.file("sciex", package = "msdata"),
#'     full.names = TRUE)
#' sciex <- Spectra(sciex_file, backend = MsBackendMzR())
#' sciex
#'
#'
#' ##  --------  CHANGING DATA REPRESENTATIONS  --------
#'
#' ## The MS data is on disk and will be read into memory on-demand. We can
#' ## however change the backend to a MsBackendMemory backend which will
#' ## keep all of the data in memory.
#' sciex_im <- setBackend(sciex, MsBackendMemory())
#' sciex_im
#'
#' ## The `MsBackendMemory()` supports the `setBackend()` method:
#' supportsSetBackend(MsBackendMemory())
#'
#' ## Thus, it is possible to change to that backend with `setBackend()`. Most
#' ## read-only backends however don't support that, such as the
#' ## `MsBackendMzR` and `setBackend()` would fail to change to that backend.
#' supportsSetBackend(MsBackendMzR())
#'
#' ## The on-disk object `sciex` is light-weight, because it does not keep the
#' ## MS peak data in memory. The `sciex_im` object in contrast keeps all the
#' ## data in memory and its size is thus much larger.
#' object.size(sciex)
#' object.size(sciex_im)
#'
#' ## The spectra variable `dataStorage` returns for each spectrum the location
#' ## where the data is stored. For in-memory objects:
#' head(dataStorage(sciex_im))
#'
#' ## While objects that use an on-disk backend will list the files where the
#' ## data is stored.
#' head(dataStorage(sciex))
#'
#' ## The spectra variable `dataOrigin` returns for each spectrum the *origin*
#' ## of the data. If the data is read from e.g. mzML files, this will be the
#' ## original mzML file name:
#' head(dataOrigin(sciex))
#' head(dataOrigin(sciex_im))
#'
#'
#' ##  -------- DATA EXPORT  --------
#'
#' ## Some `MsBackend` classes provide an `export()` method to export the data
#' ## to the file format supported by the backend.
#' ## The `MsBackendMzR` for example allows to export MS data to mzML or
#' ## mzXML file(s), the `MsBackendMgf` (defined in the MsBackendMgf R package)
#' ## would allow to export the data in mgf file format.
#' ## Below we export the MS data in `data`. We call the `export()` method on
#' ## this object, specify the backend that should be used to export the data
#' ## (and which also defines the output format) and provide a file name.
#' fl <- tempfile()
#' export(data, MsBackendMzR(), file = fl)
#'
#' ## This exported our data in mzML format. Below we read the first 6 lines
#' ## from that file.
#' readLines(fl, n = 6)
#'
#' ## If only a single file name is provided, all spectra are exported to that
#' ## file. To export data with the `MsBackendMzR` backend to different files, a
#' ## file name for each individual spectrum has to be provided.
#' ## Below we export each spectrum to its own file.
#' fls <- c(tempfile(), tempfile())
#' export(data, MsBackendMzR(), file = fls)
#'
#' ## Reading the data from the first file
#' res <- Spectra(backendInitialize(MsBackendMzR(), fls[1]))
#'
#' mz(res)
#' mz(data)
NULL

#' The Spectra class
#'
#' The [Spectra] class encapsulates data and meta-data for mass
#' spectrometry experiments.
#'
#' @slot backend A derivate of [MsBackend-class] holding/controlling the spectra
#'     data.
#'
#' @slot processingQueue `list` of `ProcessingStep` objects.
#'
#' @slot processingQueueVariables `character` of spectraVariables that should
#'     be passed to the processing step function.
#'
#' @slot processing A `character` storing logging information.
#'
#' @slot metadata A `list` storing experiment metadata.
#'
#' @slot version A `character(1)` containing the class version.
#'
#' @docType class
#'
#' @author Sebastian Gibb \email{mail@@sebastiangibb.de}
#'
#' @importClassesFrom S4Vectors DataFrame
#'
#' @importMethodsFrom S4Vectors lapply
#'
#' @importFrom S4Vectors DataFrame
#'
#' @noRd
setClass(
    "Spectra",
    slots = c(
        backend = "MsBackend",
        processingQueue = "list",
        processingQueueVariables = "character",
        ## logging
        processing = "character",
        ## metadata
        metadata = "list",
        processingChunkSize = "numeric",
        version = "character"
    ),
    prototype = prototype(version = "0.3",
                          processingChunkSize = Inf)
)

setValidity("Spectra", function(object) {
    msg <- .valid_processing_queue(object@processingQueue)
    if (length(msg)) msg
    else TRUE
})

#' @rdname hidden_aliases
#'
#' @importMethodsFrom methods show
#'
#' @importFrom utils capture.output
#'
#' @exportMethod show
setMethod("show", "Spectra",
    function(object) {
        cat("MSn data (", class(object)[1L], ") with ",
            length(object@backend), " spectra in a ", class(object@backend),
            " backend:\n", sep = "")
        if (length(object@backend)) {
            txt <- capture.output(show(object@backend))
            cat(txt[-1], sep = "\n")
        }
        if (length(object@processingQueue))
            cat("Lazy evaluation queue:", length(object@processingQueue),
                "processing step(s)\n")
        lp <- length(object@processing)
        if (lp) {
            lps <- object@processing
            if (lp > 3) {
                lps <- lps[1:3]
                lps <- c(lps, paste0("...", lp - 3, " more processings. ",
                                     "Use 'processingLog' to list all."))
            }
            cat("Processing:\n", paste(lps, collapse="\n "), "\n")
        }
    })

#' @rdname Spectra
setMethod("Spectra", "missing", function(object, processingQueue = list(),
                                         metadata = list(), ...,
                                         backend = MsBackendMemory(),
                                         BPPARAM = bpparam()) {
    if (length(backend))
        new("Spectra", metadata = metadata, processingQueue = processingQueue,
            backend = backend)
    else callNextMethod()
})

#' @rdname Spectra
setMethod("Spectra", "MsBackend", function(object, processingQueue = list(),
                                           metadata = list(), ...,
                                           BPPARAM = bpparam()) {
    new("Spectra", metadata = metadata, processingQueue = processingQueue,
        backend = object)
})

#' @rdname Spectra
#'
#' @importFrom methods callNextMethod
setMethod("Spectra", "character", function(object, processingQueue = list(),
                                           metadata = list(),
                                           source = MsBackendMzR(),
                                           backend = source,
                                           ..., BPPARAM = bpparam()) {
    sp <- .create_spectra(object, processingQueue = processingQueue,
                          metadata = metadata, backend = source,
                          ..., BPPARAM = BPPARAM)
    if (class(source)[1L] != class(backend)[1L])
        setBackend(sp, backend, ..., BPPARAM = backendBpparam(backend, BPPARAM))
    else sp
})

#' @rdname Spectra
setMethod("Spectra", "ANY", function(object, processingQueue = list(),
                                     metadata = list(),
                                     source = MsBackendMemory(),
                                     backend = source,
                                     ..., BPPARAM = bpparam()) {
    sp <- .create_spectra(object, processingQueue = processingQueue,
                          metadata = metadata, backend = source,
                          ..., BPPARAM = BPPARAM)
    if (class(source)[1L] != class(backend)[1L])
        setBackend(sp, backend, ..., BPPARAM = backendBpparam(backend, BPPARAM))
    else sp
})

.create_spectra <- function(object, processingQueue = list(), metadata = list(),
                            backend = MsBackendMemory(), ...,
                            BPPARAM = bpparam()) {
    if (missing(object))
        backend <- backendInitialize(
            backend, ..., BPPARAM = backendBpparam(backend, BPPARAM))
    else backend <- backendInitialize(
             backend, object, ..., BPPARAM = backendBpparam(backend, BPPARAM))
    new("Spectra", metadata = metadata, processingQueue = processingQueue,
        backend = backend)
}

#' @rdname Spectra
#'
#' @importMethodsFrom ProtGenerics setBackend
#'
#' @exportMethod setBackend
setMethod(
    "setBackend", c("Spectra", "MsBackend"),
    function(object, backend, f = processingChunkFactor(object), ...,
             BPPARAM = bpparam()) {
        backend_class <- class(object@backend)[1L]
        BPPARAM <- backendBpparam(object@backend, BPPARAM)
        BPPARAM <- backendBpparam(backend, BPPARAM)
        if (!supportsSetBackend(backend))
            stop(class(backend), " does not support 'setBackend'")
        if (!length(object)) {
            bknds <- backendInitialize(
                backend, data = spectraData(object@backend), ...)
        } else {
            if (!is.factor(f))
                f <- force(factor(f, levels = unique(f)))
            if (length(f) && (length(levels(f)) > 1)) {
                if (length(f) != length(object))
                    stop("length of 'f' has to match the length of 'object'")
                bknds <- bplapply(
                    split(object@backend, f = f),
                    function(z, ...) {
                        backendInitialize(backend,
                                          data = spectraData(z), ...,
                                          BPPARAM = SerialParam())
                    }, ..., BPPARAM = BPPARAM)
                bknds <- backendMerge(bknds)
                ## That below ensures the backend is returned in its original
                ## order - unsplit does unfortunately not work.
                if (is.unsorted(f))
                    bknds <- extractByIndex(
                        bknds, order(unlist(split(seq_along(bknds), f),
                                            use.names = FALSE)))
            } else {
                bknds <- backendInitialize(
                    backend, data = spectraData(object@backend), ...)
            }
        }
        object@backend <- bknds
        object@processing <- .logging(object@processing,
                                      "Switch backend from ",
                                      backend_class, " to ",
                                      class(object@backend))
        object
    })

#' @rdname Spectra
#'
#' @export
setMethod("export", "Spectra",
          function(object, backend, ...) {
              if (missing(backend))
                  stop("Parameter 'backend' is required.")
              export(backend, object, ...)
          })

#' @rdname Spectra
setMethod("dataStorageBasePath", "Spectra", function(object) {
    dataStorageBasePath(object@backend)
})

#' @rdname Spectra
setReplaceMethod("dataStorageBasePath", "Spectra", function(object, value) {
    dataStorageBasePath(object@backend) <- value
    object
})

################################################################################
##
## Accessing and adding/setting/changing MS data.
##
################################################################################

#' @title Accessing mass spectrometry data
#'
#' @name spectraData
#'
#' @aliases acquisitionNum
#' @aliases centroided
#' @aliases collisionEnergy
#' @aliases dataOrigin
#' @aliases dataStorage
#' @aliases intensity
#' @aliases ionCount
#' @aliases isCentroided
#' @aliases isEmpty
#' @aliases isolationWindowLowerMz
#' @aliases isolationWindowUpperMz
#' @aliases isolationWindowTargetMz
#' @aliases lengths
#' @aliases msLevel
#' @aliases mz
#' @aliases peaksData
#' @aliases peaksVariables
#' @aliases polarity
#' @aliases precursorCharge
#' @aliases precursorIntensity
#' @aliases precursorMz
#' @aliases rtime
#' @aliases scanIndex
#' @aliases smoothed
#' @aliases spectraData
#' @aliases spectraNames
#' @aliases spectraVariables
#' @aliases tic
#' @aliases uniqueMsLevels
#'
#' @description
#'
#' As detailed in the documentation of the [Spectra] class, a `Spectra` object
#' is a container for mass spectrometry (MS) data that includes both the mass
#' peaks data (or *peaks data*, generally *m/z* and intensity values) as well
#' as spectra metadata (so called *spectra variables*). Spectra variables
#' generally define one value per spectrum, while for peaks variables one value
#' per mass peak is defined and hence multiple values per spectrum (depending
#' on the number of mass peaks of a spectrum).
#'
#' Data can be extracted from a `Spectra` object using dedicated accessor
#' functions or also using the `$` operator. Depending on the backend class
#' used by the `Spectra` to represent the data, data can also be added or
#' replaced (again, using dedicated functions or using `$<-`).
#'
#'
#' @param BPPARAM Parallel setup configuration. See [BiocParallel::bpparam()]
#'     for more information. See also [processingChunkSize()] for more
#'     information on parallel processing.
#'
#' @param columns For `spectraData()` accessor: optional `character` with
#'     column names (spectra variables) that should be included in the
#'     returned `DataFrame`. By default, all columns are returned.
#'     For `peaksData()` accessor: optional `character` with requested columns
#'     in the individual `matrix` of the returned `list`. Defaults to
#'     `c("mz", "value")` but any values returned by `peaksVariables(object)`
#'     with `object` being the `Spectra` object are supported.
#'
#' @param f For `intensity()`, `mz()` and `peaksData()`: factor defining how
#'     data should be chunk-wise loaded an processed. Defaults to
#'     [processingChunkFactor()].
#'
#' @param i For `asDataFrame()`: A `numeric` indicating which scans to coerce
#'     to a `DataFrame` (default is `seq_along(object)`).
#'
#' @param initial For `tic()`: `logical(1)` whether the initially
#'     reported total ion current should be reported, or whether the
#'     total ion current should be (re)calculated on the actual data
#'     (`initial = FALSE`, same as `ionCount()`).
#'
#' @param j For `[`: not supported.
#'
#' @param name For `$` and `$<-`: the name of the spectra variable to return
#'     or set.
#'
#' @param object A `Spectra` object.
#'
#' @param spectraVars `character()` indicating what spectra variables to add to
#'     the `DataFrame`. Default is `spectraVariables(object)`, i.e. all
#'     available variables.
#'
#' @param use.names For `lengths()`: ignored.
#'
#' @param value A vector with values to replace the respective spectra
#'     variable. Needs to be of the correct data type for the spectra variable.
#'
#' @param x A `Spectra` object.
#'
#' @param ... Additional arguments.
#'
#'
#' @section Spectra variables:
#'
#' A common set of *core spectra variables* are defined for `Spectra`. These
#' have a pre-defined data type and each `Spectra` will return a value for
#' these if requested. If no value for a spectra variable is defined, a missing
#' value (of the correct data type) is returned. The list of core spectra
#' variables and their respective data type is:
#'
#' - *acquisitionNum* `integer(1)`: the index of acquisition of a spectrum
#'   during an MS run.
#' - *centroided* `logical(1)`: whether the spectrum is in profile or centroid
#'   mode.
#' - *collisionEnergy* `numeric(1)`: collision energy used to create an MSn
#'   spectrum.
#' - *dataOrigin* `character(1)`: the *origin* of the spectrum's data, e.g. the
#'   mzML file from which it was read.
#' - *dataStorage* `character(1)`: the (current) storage location of the
#'   spectrum data. This value depends on the backend used to handle and
#'   provide the data. For an *in-memory* backend like the `MsBackendDataFrame`
#'   this will be `"<memory>"`, for an on-disk backend such as the
#'   `MsBackendHdf5Peaks` it will be the name of the HDF5 file where the
#'   spectrum's peak data is stored.
#' - *isolationWindowLowerMz* `numeric(1)`: lower m/z for the isolation
#'   window in which the (MSn) spectrum was measured.
#' - *isolationWindowTargetMz* `numeric(1)`: the target m/z for the isolation
#'   window in which the (MSn) spectrum was measured.
#' - *isolationWindowUpperMz* `numeric(1)`: upper m/z for the isolation window
#'   in which the (MSn) spectrum was measured.
#' - *msLevel* `integer(1)`: the MS level of the spectrum.
#' - *polarity* `integer(1)`: the polarity of the spectrum (`0` and `1`
#'   representing negative and positive polarity, respectively).
#' - *precScanNum* `integer(1)`: the scan (acquisition) number of the precursor
#'   for an MSn spectrum.
#' - *precursorCharge* `integer(1)`: the charge of the precursor of an MSn
#'   spectrum.
#' - *precursorIntensity* `numeric(1)`: the intensity of the precursor of an
#'   MSn spectrum.
#' - *precursorMz* `numeric(1)`: the m/z of the precursor of an MSn spectrum.
#' - *rtime* `numeric(1)`: the retention time of a spectrum.
#' - *scanIndex* `integer(1)`: the index of a spectrum within a (raw) file.
#' - *smoothed* `logical(1)`: whether the spectrum was smoothed.
#'
#' For each of these spectra variable a dedicated accessor function is defined
#' (such as `msLevel()` or `rtime()`) that allows to extract the values of
#' that spectra variable for all spectra in a `Spectra` object. Also,
#' replacement functions are defined, but not all backends might support
#' replacing values for spectra variables. As described above, additional
#' spectra variables can be defined or added. The `spectraVariables()` function
#' can be used to
#'
#' Values for multiple spectra variables, or all spectra vartiables* can be
#' extracted with the `spectraData()` function.
#'
#'
#' @section Peaks variables:
#'
#' `Spectra` also provide mass peak data with the *m/z* and intensity values
#' being the *core* peaks variables:
#'
#' - *intensity* `numeric`: intensity values for the spectrum's peaks.
#' - *mz* `numeric`: the m/z values for the spectrum's peaks.
#'
#' Values for these can be extracted with the `mz()` and `intensity()`
#' functions, or the `peaksData()` function. The former functions return a
#' `NumericList` with the respective values, while the latter returns a `List`
#' with `numeric` two-column matrices. The list of peaks matrices can also
#' be extracted using `as(x, "list")` or `as(x, "SimpleList")` with `x` being
#' a `Spectra` object.
#'
#' Some `Spectra`/backends provide also values for additional peaks variables.
#' The set of available peaks variables can be extracted with the
#' `peaksVariables()` function.
#'
#'
#' @section Functions to access MS data:
#'
#' The set of available functions to extract data from, or set data in, a
#' `Spectra` object are (in alphabetical order) listed below. Note that there
#' are also other functions to extract information from a `Spectra` object
#' documented in [addProcessing()].
#'
#' - `$`, `$<-`: gets (or sets) a spectra variable for all spectra in `object`.
#'   See examples for details. Note that replacing values of a peaks variable
#'   is not supported with a non-empty processing queue, i.e. if any filtering
#'   or data manipulations on the peaks data was performed. In these cases
#'   [applyProcessing()] needs to be called first to apply all cached data
#'   operations.
#'
#' - `[[`, `[[<-`: access or set/add a single spectrum variable (column) in the
#'   backend.
#'
#' - `acquisitionNum()`: returns the acquisition number of each
#'   spectrum. Returns an `integer` of length equal to the number of
#'   spectra (with `NA_integer_` if not available).
#'
#' - `asDataFrame()`: converts the `Spectra` to a `DataFrame` (in long format)
#'   contining all data. Returns a `DataFrame`.
#'
#' - `centroided()`, `centroided<-`: gets or sets the centroiding
#'   information of the spectra. `centroided()` returns a `logical`
#'   vector of length equal to the number of spectra with `TRUE` if a
#'   spectrum is centroided, `FALSE` if it is in profile mode and `NA`
#'   if it is undefined. See also `isCentroided()` for estimating from
#'   the spectrum data whether the spectrum is centroided. `value`
#'   for `centroided<-` is either a single `logical` or a `logical` of
#'   length equal to the number of spectra in `object`.
#'
#' - `collisionEnergy()`, `collisionEnergy<-`: gets or sets the
#'   collision energy for all spectra in `object`. `collisionEnergy()`
#'   returns a `numeric` with length equal to the number of spectra
#'   (`NA_real_` if not present/defined), `collisionEnergy<-` takes a
#'   `numeric` of length equal to the number of spectra in `object`.
#'
#' - `coreSpectraVariables()`: returns the *core* spectra variables along with
#'   their expected data type.
#'
#' - `dataOrigin()`, `dataOrigin<-`: gets or sets the *data origin* for each
#'   spectrum. `dataOrigin()` returns a `character` vector (same length than
#'   `object`) with the origin of the spectra. `dataOrigin<-` expects a
#'   `character` vector (same length than `object`) with the replacement
#'   values for the data origin of each spectrum.
#'
#' - `dataStorage()`: returns a `character` vector (same length than `object`)
#'   with the data storage location of each spectrum.
#'
#' - `intensity()`: gets the intensity values from the spectra. Returns
#'   a [IRanges::NumericList()] of `numeric` vectors (intensity values for each
#'   spectrum). The length of the list is equal to the number of
#'   `spectra` in `object`.
#'
#' - `ionCount()`: returns a `numeric` with the sum of intensities for
#'   each spectrum. If the spectrum is empty (see `isEmpty()`),
#'   `NA_real_` is returned.
#'
#' - `isCentroided()`: a heuristic approach assessing if the spectra in
#'   `object` are in profile or centroided mode. The function takes
#'   the `qtl`th quantile top peaks, then calculates the difference
#'   between adjacent m/z value and returns `TRUE` if the first
#'   quartile is greater than `k`. (See `Spectra:::.isCentroided()` for
#'   the code.)
#'
#' - `isEmpty()`: checks whether a spectrum in `object` is empty
#'   (i.e. does not contain any peaks). Returns a `logical` vector of
#'   length equal number of spectra.
#'
#' - `isolationWindowLowerMz()`, `isolationWindowLowerMz<-`: gets or sets the
#'   lower m/z boundary of the isolation window.
#'
#' - `isolationWindowTargetMz()`, `isolationWindowTargetMz<-`: gets or sets the
#'   target m/z of the isolation window.
#'
#' - `isolationWindowUpperMz()`, `isolationWindowUpperMz<-`: gets or sets the
#'   upper m/z boundary of the isolation window.
#'
#' - `length()`: gets the number of spectra in the object.
#'
#' - `lengths()`: gets the number of peaks (m/z-intensity values) per
#'   spectrum. Returns an `integer` vector (length equal to the
#'   number of spectra). For empty spectra, `0` is returned.
#'
#' - `msLevel()`: gets the spectra's MS level. Returns an integer vector (names
#'   being spectrum names, length equal to the number of spectra) with the MS
#'   level for each spectrum.
#'
#' - `mz()`: gets the mass-to-charge ratios (m/z) from the
#'   spectra. Returns a [IRanges::NumericList()] or length equal to the number
#'   of spectra, each element a `numeric` vector with the m/z values of
#'   one spectrum.
#'
#' - `peaksData()`: gets the *peaks* data for all spectra in `object`. Peaks
#'   data consist of the m/z and intensity values as well as possible additional
#'   annotations (variables) of all peaks of each spectrum. The function
#'   returns a [S4Vectors::SimpleList()] of two dimensional arrays (either
#'   `matrix` or `data.frame`), with each array providing the values for the
#'   requested *peak variables* (by default `"mz"` and `"intensity"`).
#'   Optional parameter `columns` is passed to the backend's `peaksData()`
#'   function to allow the selection of specific (or additional) peaks
#'   variables (columns) that should be extracted (if available). Importantly,
#'   it is **not** guaranteed that each backend supports this parameter (while
#'   each backend must support extraction of `"mz"` and `"intensity"` columns).
#'   Parameter `columns` defaults to `c("mz", "intensity")` but any value
#'   returned by `peaksVariables(object)` is supported.
#'   Note also that it is possible to extract the peak data with
#'   `as(x, "list")` and `as(x, "SimpleList")` as a `list` and `SimpleList`,
#'   respectively. Note however that, in contrast to `peaksData()`, `as()`
#'   does not support the parameter `columns`.
#'
#' - `peaksVariables()`: lists the available variables for mass peaks provided
#'   by the backend. Default peak variables are `"mz"` and `"intensity"` (which
#'   all backends need to support and provide), but some backends might provide
#'   additional variables.
#'   These variables correspond to the column names of the peak data array
#'   returned by `peaksData()`.
#'
#' - `polarity()`, `polarity<-`: gets or sets the polarity for each
#'   spectrum. `polarity()` returns an `integer` vector (length equal
#'   to the number of spectra), with `0` and `1` representing negative
#'   and positive polarities, respectively. `polarity<-` expects an
#'   `integer` vector of length 1 or equal to the number of spectra.
#'
#' - `precursorCharge()`, `precursorIntensity()`, `precursorMz()`,
#'   `precScanNum()`, `precAcquisitionNum()`: gets the charge (`integer`),
#'   intensity (`numeric`), m/z (`numeric`), scan index (`integer`)
#'   and acquisition number (`interger`) of the precursor for MS level >
#'   2 spectra from the object. Returns a vector of length equal to
#'   the number of spectra in `object`. `NA` are reported for MS1
#'   spectra of if no precursor information is available.
#'
#' - `rtime()`, `rtime<-`: gets or sets the retention times (in seconds)
#'   for each spectrum.  `rtime()` returns a `numeric` vector (length
#'   equal to the number of spectra) with the retention time for each
#'   spectrum.  `rtime<-` expects a numeric vector with length equal
#'   to the number of spectra.
#'
#' - `scanIndex()`: returns an `integer` vector with the *scan index*
#'   for each spectrum. This represents the relative index of the
#'   spectrum within each file. Note that this can be different to the
#'   `acquisitionNum` of the spectrum which represents the index of the
#'   spectrum during acquisition/measurement (as reported in the mzML file).
#'
#' - `smoothed()`,`smoothed<-`: gets or sets whether a spectrum is
#'   *smoothed*. `smoothed()` returns a `logical` vector of length equal
#'   to the number of spectra. `smoothed<-` takes a `logical` vector
#'   of length 1 or equal to the number of spectra in `object`.
#'
#' - `spectraData()`: gets general spectrum metadata (annotation, also called
#'   header). `spectraData()` returns a `DataFrame`. Note that this
#'   method does by default **not** return m/z or intensity values.
#'
#' - `spectraData<-`: **replaces** the full spectra data of the `Spectra`
#'   object with the one provided with `value`. The `spectraData<-` function
#'   expects a `DataFrame` to be passed as value with the same number of rows
#'   as there a spectra in `object`. Note that replacing values of
#'   peaks variables is not supported with a non-empty processing queue, i.e.
#'   if any filtering or data manipulations on the peaks data was performed.
#'   In these cases [applyProcessing()] needs to be called first to apply all
#'   cached data operations and empty the processing queue.
#'
#' - `spectraNames()`, `spectraNames<-`: gets or sets the spectra names.
#'
#' - `spectraVariables()`: returns a `character` vector with the
#'   available spectra variables (columns, fields or attributes of each
#'   spectrum) available in `object`. Note that `spectraVariables()` does not
#'   list the *peak variables* (`"mz"`, `"intensity"` and eventual additional
#'   annotations for each MS peak). Peak variables are returned by
#'   `peaksVariables()`.
#'
#' - `tic()`: gets the total ion current/count (sum of signal of a
#'   spectrum) for all spectra in `object`. By default, the value
#'   reported in the original raw data file is returned. For an empty
#'   spectrum, `0` is returned.
#'
#' - `uniqueMsLevels()`: get the unique MS levels available in `object`. This
#'   function is supposed to be more efficient than `unique(msLevel(object))`.
#'
#' @md
#'
#' @seealso
#'
#' - [addProcessing()] for functions to analyze `Spectra`.
#'
#' - [Spectra] for a general description of the `Spectra` object.
#'
#' @author Sebastian Gibb, Johannes Rainer, Laurent Gatto, Philippine Louail
#'
#' @examples
#'
#' ## Create a Spectra from mzML files and use the `MsBackendMzR` on-disk
#' ## backend.
#' sciex_file <- dir(system.file("sciex", package = "msdata"),
#'     full.names = TRUE)
#' sciex <- Spectra(sciex_file, backend = MsBackendMzR())
#' sciex
#'
#' ## Get the number of spectra in the data set
#' length(sciex)
#'
#' ## Get the number of mass peaks per spectrum - limit to the first 6
#' lengths(sciex) |> head()
#'
#' ## Get the MS level for each spectrum - limit to the first 6 spectra
#' msLevel(sciex) |> head()
#'
#' ## Alternatively, we could also use $ to access a specific spectra variable.
#' ## This could also be used to add additional spectra variables to the
#' ## object (see further below).
#' sciex$msLevel |> head()
#'
#' ## Get the intensity and m/z values.
#' intensity(sciex)
#' mz(sciex)
#'
#' ## Convert a subset of the Spectra object to a long DataFrame.
#' asDataFrame(sciex, i = 1:3, spectraVars = c("rtime", "msLevel"))
#'
#' ## Create a Spectra providing a `DataFrame` containing the spectrum data.
#'
#' spd <- DataFrame(msLevel = c(1L, 2L), rtime = c(1.1, 1.2))
#' spd$mz <- list(c(100, 103.2, 104.3, 106.5), c(45.6, 120.4, 190.2))
#' spd$intensity <- list(c(200, 400, 34.2, 17), c(12.3, 15.2, 6.8))
#'
#' s <- Spectra(spd)
#' s
#'
#' ## List all available spectra variables (i.e. spectrum data and metadata).
#' spectraVariables(s)
#'
#' ## For all *core* spectrum variables accessor functions are available. These
#' ## return NA if the variable was not set.
#' centroided(s)
#' dataStorage(s)
#' rtime(s)
#' precursorMz(s)
#'
#' ## The core spectra variables are:
#' coreSpectraVariables()
#'
#' ## Add an additional metadata column.
#' s$spectrum_id <- c("sp_1", "sp_2")
#'
#' ## List spectra variables, "spectrum_id" is now also listed
#' spectraVariables(s)
#'
#' ## Get the values for the new spectra variable
#' s$spectrum_id
#'
#' ## Extract specific spectra variables.
#' spectraData(s, columns = c("spectrum_id", "msLevel"))
#'
#'
#' ##  --------  PEAKS VARIABLES AND DATA  --------
#'
#' ## Get the peak data (m/z and intensity values).
#' pks <- peaksData(s)
#' pks
#' pks[[1]]
#' pks[[2]]
#'
#' ## Note that we could get the same resulb by coercing the `Spectra` to
#' ## a `list` or `SimpleList`:
#' as(s, "list")
#' as(s, "SimpleList")
#'
#' ## Or use `mz()` and `intensity()` to extract the m/z and intensity values
#' ## separately
#' mz(s)
#' intensity(s)
#'
#' ## Some `MsBackend` classes provide support for arbitrary peaks variables
#' ## (in addition to the mandatory `"mz"` and `"intensity"` values. Below
#' ## we create a simple data frame with an additional peak variable `"pk_ann"`
#' ## and create a `Spectra` with a `MsBackendMemory` for that data.
#' ## Importantly the number of values (per spectrum) need to be the same
#' ## for all peak variables.
#'
#' tmp <- data.frame(msLevel = c(2L, 2L), rtime = c(123.2, 123.5))
#' tmp$mz <- list(c(103.1, 110.4, 303.1), c(343.2, 453.1))
#' tmp$intensity <- list(c(130.1, 543.1, 40), c(0.9, 0.45))
#' tmp$pk_ann <- list(c(NA_character_, "A", "P"), c("B", "P"))
#'
#' ## Create the Spectra. With parameter `peaksVariables` we can define
#' ## the columns in `tmp` that contain peaks variables.
#' sps <- Spectra(tmp, source = MsBackendMemory(),
#'     peaksVariables = c("mz", "intensity", "pk_ann"))
#' peaksVariables(sps)
#'
#' ## Extract just the m/z and intensity values
#' peaksData(sps)[[1L]]
#'
#' ## Extract the full peaks data
#' peaksData(sps, columns = peaksVariables(sps))[[1L]]
#'
#' ## Access just the pk_ann variable
#' sps$pk_ann
#'
#'
NULL

#' @importFrom methods setAs
setAs("Spectra", "list", function(from, to) {
    .peaksapply(from)
})

setAs("Spectra", "SimpleList", function(from, to) {
    peaksData(from)
})

#' @export
#'
#' @rdname spectraData
asDataFrame <- function(object, i = seq_along(object),
                        spectraVars = spectraVariables(object)) {
    stopifnot(inherits(object, "Spectra"))
    object <- object[i]
    n <- sapply(peaksData(object), nrow)
    v <- spectraData(object)[rep(seq_along(object), n), spectraVars]
    p <- do.call(rbind, as.list(peaksData(object)))
    cbind(p, v)
}

#' @rdname spectraData
#'
#' @export
setMethod("acquisitionNum", "Spectra", function(object)
    acquisitionNum(object@backend))

#' @rdname spectraData
setMethod("centroided", "Spectra", function(object) {
    centroided(object@backend)
})

#' @rdname spectraData
setReplaceMethod("centroided", "Spectra", function(object, value) {
    centroided(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("collisionEnergy", "Spectra", function(object) {
    collisionEnergy(object@backend)
})

#' @rdname spectraData
setReplaceMethod("collisionEnergy", "Spectra", function(object, value) {
    collisionEnergy(object@backend) <- value
    object
})

#' @rdname spectraData
#'
#' @export
coreSpectraVariables <- function() .SPECTRA_DATA_COLUMNS

#' @rdname spectraData
setMethod("dataOrigin", "Spectra", function(object) dataOrigin(object@backend))

#' @rdname spectraData
setReplaceMethod("dataOrigin", "Spectra", function(object, value) {
    dataOrigin(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("dataStorage", "Spectra",
          function(object) dataStorage(object@backend))

#' @rdname spectraData
setMethod("intensity", "Spectra", function(object,
                                           f = processingChunkFactor(object),
                                           ...) {
    if (length(object@processingQueue) || length(f))
        NumericList(.peaksapply(object, FUN = function(z, ...) z[, 2],
                                f = f, ...), compress = FALSE)
    else intensity(object@backend)
})

#' @rdname spectraData
setMethod("ionCount", "Spectra", function(object) {
    if (length(object))
        unlist(.peaksapply(
            object, FUN = function(pks, ...) sum(pks[, 2], na.rm = TRUE)),
            use.names = FALSE)
    else numeric()
})

#' @rdname spectraData
setMethod("isCentroided", "Spectra", function(object, ...) {
    if (length(object))
        unlist(.peaksapply(object, FUN = .peaks_is_centroided),
               use.names = FALSE)
    else logical()
})

#' @rdname spectraData
setMethod("isEmpty", "Spectra", function(x) {
    if (length(x))
        unlist(.peaksapply(x, FUN = function(pks, ...) nrow(pks) == 0),
               use.names = FALSE)
    else logical()
})

#' @rdname spectraData
setMethod("isolationWindowLowerMz", "Spectra", function(object) {
    isolationWindowLowerMz(object@backend)
})

#' @rdname spectraData
setReplaceMethod("isolationWindowLowerMz", "Spectra", function(object, value) {
    isolationWindowLowerMz(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("isolationWindowTargetMz", "Spectra", function(object) {
    isolationWindowTargetMz(object@backend)
})

#' @rdname spectraData
setReplaceMethod("isolationWindowTargetMz", "Spectra", function(object, value) {
    isolationWindowTargetMz(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("isolationWindowUpperMz", "Spectra", function(object) {
    isolationWindowUpperMz(object@backend)
})

#' @rdname spectraData
setReplaceMethod("isolationWindowUpperMz", "Spectra", function(object, value) {
    isolationWindowUpperMz(object@backend) <- value
    object
})

#' @rdname spectraData
#'
#' @exportMethod length
setMethod("length", "Spectra", function(x) length(x@backend))

#' @rdname spectraData
#'
#' @exportMethod lengths
setMethod("lengths", "Spectra", function(x, use.names = FALSE) {
    f <- .parallel_processing_factor(x)
    if (length(x)) {
        if (length(x@processingQueue) || length(f))
            unlist(.peaksapply(x, FUN = function(pks, ...) nrow(pks)),
                   use.names = use.names)
        else lengths(x@backend, use.names = use.names)
    } else integer()
})

#' @rdname spectraData
setMethod("msLevel", "Spectra", function(object) msLevel(object@backend))

#' @rdname spectraData
setMethod("mz", "Spectra", function(object, f = processingChunkFactor(object),
                                    ...) {
    if (length(object@processingQueue) || length(f))
        NumericList(.peaksapply(object, FUN = function(z, ...) z[, 1],
                                f = f, ...), compress = FALSE)
    else mz(object@backend)
})

#' @rdname spectraData
#'
#' @export
setMethod(
    "peaksData", "Spectra",
    function(object, columns = c("mz", "intensity"),
             f = processingChunkFactor(object), ..., BPPARAM = bpparam()) {
        if (length(object@processingQueue) || length(f))
            SimpleList(.peaksapply(object, columns = columns, f = f))
        else SimpleList(peaksData(object@backend, columns = columns))
    })

#' @rdname spectraData
setMethod("peaksVariables", "Spectra", function(object)
    peaksVariables(object@backend))

#' @rdname spectraData
setMethod("polarity", "Spectra", function(object) {
    polarity(object@backend)
})

#' @rdname spectraData
setReplaceMethod("polarity", "Spectra", function(object, value) {
    polarity(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("precScanNum", "Spectra", function(object) {
    precScanNum(object@backend)
})

#' @rdname spectraData
setMethod("precursorCharge", "Spectra", function(object) {
    precursorCharge(object@backend)
})

#' @rdname spectraData
setMethod("precursorIntensity", "Spectra", function(object) {
    precursorIntensity(object@backend)
})

#' @rdname spectraData
setMethod("precursorMz", "Spectra", function(object) {
    precursorMz(object@backend)
})

#' @rdname spectraData
setReplaceMethod("precursorMz", "Spectra", function(object, ..., value) {
    precursorMz(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("rtime", "Spectra", function(object) {
    rtime(object@backend)
})

#' @rdname spectraData
setReplaceMethod("rtime", "Spectra", function(object, value) {
    rtime(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("scanIndex", "Spectra", function(object) {
    scanIndex(object@backend)
})

#' @rdname spectraData
setMethod("smoothed", "Spectra", function(object) {
    smoothed(object@backend)
})

#' @rdname spectraData
setReplaceMethod("smoothed", "Spectra", function(object, value) {
    smoothed(object@backend) <- value
    object
})

#' @rdname spectraData
#'
#' @importMethodsFrom ProtGenerics spectraData
#'
#' @exportMethod spectraData
setMethod(
    "spectraData", "Spectra",
    function(object, columns = spectraVariables(object)) {
        if (length(object@processingQueue) &&
            length(pcns <- intersect(columns, peaksVariables(object)))) {
            ## If user requests peaks variables we need to ensure that the
            ## processing queue is executed.
            scns <- setdiff(columns, pcns)
            if (length(scns))
                spd <- spectraData(object@backend, columns = scns)
            else
                spd <- make_zero_col_DFrame(nrow = length(object))
            pkd <- peaksData(object, columns = pcns)
            ## Add individual peaks variables to the `DataFrame`.
            for (pcn in pcns) {
                vals <- lapply(pkd, `[`, , pcn)
                if (pcn %in% c("mz", "intensity"))
                    vals <- NumericList(vals, compress = FALSE)
                spd <- do.call(`[[<-`, list(spd, i = pcn, value = vals))
            }
            spd
        } else
            spectraData(object@backend, columns = columns)
    })

#' @rdname spectraData
#'
#' @importMethodsFrom ProtGenerics spectraData<-
#'
#' @exportMethod spectraData<-
setReplaceMethod("spectraData", "Spectra", function(object, value) {
    if (!inherits(value, "DataFrame"))
        stop("'spectraData<-' expects a 'DataFrame' as input.", call. = FALSE)
    pvs <- peaksVariables(object)
    if (length(object@processingQueue) &&
        any(colnames(value) %in% pvs))
        stop("Can not replace peaks variables with a non-empty processing ",
             "queue. Please use 'object <- applyProcessing(object)' to apply ",
             "and clear the processing queue. Note that 'applyProcessing' ",
             "requires a *writeable* backend. Use e.g. 'object <- ",
             "setBackend(object, MsBackendMemory())' if needed.")
    pvs <- setdiff(pvs, colnames(value))
    if (length(pvs)) {
        sd <- spectraData(object, pvs)
        for (pv in pvs) {
            value <- do.call("$<-", list(value, name = pv, sd[, pv]))
        }
        object@processingQueue <- list()
    }
    spectraData(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("spectraNames", "Spectra", function(object) {
    spectraNames(object@backend)
})

#' @rdname spectraData
setReplaceMethod("spectraNames", "Spectra", function(object, value) {
    spectraNames(object@backend) <- value
    object
})

#' @rdname spectraData
setMethod("spectraVariables", "Spectra", function(object) {
    setdiff(spectraVariables(object@backend), peaksVariables(object@backend))
})

#' @rdname spectraData
setMethod("tic", "Spectra", function(object, initial = TRUE) {
    if (!length(object))
        return(numeric())
    if (initial)
        tic(object@backend, initial = initial)
    else ionCount(object)
})

#' @rdname spectraData
setMethod("uniqueMsLevels", "Spectra", function(object, ...) {
    uniqueMsLevels(object@backend, ...)
})

#' @rdname spectraData
#'
#' @importMethodsFrom S4Vectors $
#'
#' @export
setMethod("$", "Spectra", function(x, name) {
    if (!(name %in% c(spectraVariables(x@backend), peaksVariables(x@backend))))
        stop("No spectra variable '", name, "' available")
    if (name == "mz")
        mz(x)
    else if (name == "intensity")
        intensity(x)
    else {
        if (length(x@processingQueue) && name %in% peaksVariables(x))
            .peaksapply(x, FUN = function(z, ...) z[, name],
                        columns = c("mz", "intensity", name))
        else
            do.call("$", list(x@backend, name))
    }
})

#' @rdname spectraData
#'
#' @export
setReplaceMethod("$", "Spectra", function(x, name, value) {
    if (length(x@processingQueue) &&
        any(name %in% peaksVariables(x)))
        stop("Can not replace peaks variables with a non-empty processing ",
             "queue. Please use 'object <- applyProcessing(object)' to apply ",
             "and clear the processing queue. Note that 'applyProcessing' ",
             "requires a *writeable* backend. Use e.g. 'object <- ",
             "setBackend(object, MsBackendMemory())' if needed.")
    x@backend <- do.call("$<-", list(x@backend, name, value))
    x
})

#' @rdname spectraData
#'
#' @export
setMethod("[[", "Spectra", function(x, i, j, ...) {
    if (!is.character(i))
        stop("'i' is supposed to be a character defining the spectra ",
             "variable to access.")
    if (!missing(j))
        stop("'j' is not supported.")
    if (!(i %in% c(spectraVariables(x), "mz", "intensity")))
        stop("No spectra variable '", i, "' available")
    if (i == "mz")
        mz(x)
    else if (i == "intensity")
        intensity(x)
    else
        do.call("[[", list(x@backend, i))
})

#' @rdname spectraData
#'
#' @export
setReplaceMethod("[[", "Spectra", function(x, i, j, ..., value) {
    if (!is.character(i))
        stop("'i' is supposed to be a character defining the spectra ",
             "variable to replace or create.")
    if (!missing(j))
        stop("'j' is not supported.")
    x@backend <- do.call("[[<-", list(x@backend, i = i, value = value))
    x
})


################################################################################
##
## Merging, splitting and aggregating Spectra: length of Spectra is changed
##
################################################################################

#' @title Merging, aggregating and splitting Spectra
#'
#' @name combineSpectra
#'
#' @aliases combineSpectra
#' @aliases split
#' @aliases joinSpectraData
#' @aliases cbind2
#'
#' @description
#'
#' Various functions are availabe to combine, aggregate or split data from one
#' of more `Spectra` objects. These are:
#'
#' - `c()` and `concatenateSpectra()`: combines several `Spectra` objects into
#'   a single object. The resulting `Spectra` contains all data from all
#'   individual `Spectra`, i.e. the union of all their spectra variables.
#'   Concatenation will fail if the processing queue of any of the `Spectra`
#'   objects is not empty or if different backends are used for the `Spectra`
#'   objects. In such cases it is suggested to first change the backends of
#'   all `Spectra` to the same type of backend (using the [setBackend()]
#'   function and to eventually (if needed) apply the processing queue using
#'   the [applyProcessing()] function.
#'
#' - `cbind2()`: Appends multiple spectra variables from a `data.frame`,
#'   `DataFrame` or `matrix` to the `Spectra` object at once. The order of
#'   the values (rows) in `y` has to match the order of spectra in `x`. The
#'   function does not allow to replace existing spectra variables. `cbind2()`
#'   returns a `Spectra` object with the appended spectra variables. For a more
#'   controlled way of adding spectra variables, see the `joinSpectraData()`
#'   function.
#'
#' - `combineSpectra()`: combines sets of spectra (defined with parameter `f`)
#'   into a single spectrum per set aggregating their MS data (i.e. their
#'   *peaks data* matrices with the *m/z* and intensity values of their
#'   mass peaks). The spectra variable values of the first spectrum per set
#'   are reported for the combined spectrum. The peak matrices of the spectra
#'   per set are combined using the function specified with parameter `FUN`
#'   which uses by default the [combinePeaksData()] function. See the
#'   documentation of [combinePeaksData()] for details on the aggregation of
#'   the peak data and the package vignette for examples.
#'   The sets of spectra can be specified with parameter `f` which is expected
#'   to be a `factor` or `vector` of length equal to the length of the
#'   `Spectra` specifying to which set a spectrum belongs to. The function
#'   returns a `Spectra` of length equal to the unique levels of `f`. The
#'   optional parameter `p` allows to define how the `Spectra` should be
#'   split for potential parallel processing. The default is
#'   `p = x$dataStorage` and hence a per storage file parallel processing is
#'   applied for `Spectra` with on disk data representations (such as the
#'   [MsBackendMzR()]). This also prevents that spectra from different data
#'   files/samples are combined (eventually use e.g. `p = x$dataOrigin` or any
#'   other spectra variables defining the originating samples for a spectrum).
#'   Before combining the peaks data, all eventual present processing steps are
#'   applied (by calling [applyProcessing()] on the `Spectra`). This function
#'   will replace the original *m/z* and intensity values of a `Spectra` hence
#'   it can not be called on a `Spectra` with a *read-only* backend. In such
#'   cases, the backend should be changed to a *writeable* backend before
#'   using the [setBackend()] function (to e.g. a [MsBackendMemory()] backend).
#'
#' - `joinSpectraData()`: Individual spectra variables can be directly
#'    added with the `$<-` or `[[<-` syntax. The `joinSpectraData()`
#'    function allows to merge a `DataFrame` to the existing spectra
#'    data of a `Spectra`. This function diverges from the [merge()] method in
#'    two main ways:
#'    - The `by.x` and `by.y` column names must be of length 1.
#'    - If variable names are shared in `x` and `y`, the spectra
#'      variables of `x` are not modified. It's only the `y`
#'      variables that are appended with the suffix defined in
#'      `suffix.y`. This is to avoid modifying any core spectra
#'      variables that would lead to an invalid object.
#'    - Duplicated Spectra keys (i.e. `x[[by.x]]`) are not
#'      allowed. Duplicated keys in the `DataFrame` (i.e `y[[by.y]]`)
#'      throw a warning and only the last occurrence is kept. These
#'      should be explored and ideally be removed using for
#'      `QFeatures::reduceDataFrame()`, `PMS::reducePSMs()` or similar
#'      functions.
#'    For a more general function that allows to append `data.frame`,
#'    `DataFrame` and `matrix` see `cbind2()`.
#'
#' - `split()`: splits the `Spectra` object based on parameter `f` into a `list`
#'   of `Spectra` objects.
#'
#' @param BPPARAM Parallel setup configuration. See [BiocParallel::bpparam()]
#'     for more information. This is passed directly to the
#'     [backendInitialize()] method of the [MsBackend-class].
#'
#' @param by.x A `character(1)` specifying the spectra variable used
#'     for merging. Default is `"spectrumId"`.
#'
#' @param by.y A `character(1)` specifying the column used for
#'     merging. Set to `by.x` if missing.
#'
#' @param drop For `split()`: not considered.
#'
#' @param f For `split()`: factor defining how to split `x`. See [base::split()]
#'     for details.
#'     For `combineSpectra()`: `factor` defining the grouping of the spectra
#'     that should be combined. Defaults to `x$dataStorage`.
#'
#' @param FUN For `combineSpectra()`: function to combine the (peak matrices)
#'     of the spectra. Defaults to [combinePeaksData()].
#'
#' @param p For `combineSpectra()`: `factor` defining how to split the input
#'     `Spectra` for parallel processing. Defaults to `x$dataStorage`, i.e.,
#'     depending on the used backend, per-file parallel processing will be
#'     performed.
#'
#' @param suffix.y A `character(1)` specifying the suffix to be used
#'     for making the names of columns in the merged spectra variables
#'     unique. This suffix will be used to amend `names(y)`, while
#'     `spectraVariables(x)` will remain unchanged.
#'
#' @param x A `Spectra` object.
#'
#' @param y For `joinSpectraData()`: `DataFrame` with the spectra variables
#'     to join/add. For `cbind2()`: a `data.frame`, `DataFrame` or
#'     `matrix`. The number of rows and their order has to match the
#'     number of spectra in `x`, respectively their order.
#'
#' @param ... Additional arguments.
#'
#' @seealso
#'
#' - [combinePeaks()] for functions to aggregate mass peaks data.
#'
#' - [Spectra] for a general description of the `Spectra` object.
#'
#' @importFrom MsCoreUtils vapply1c
#'
#' @author Sebastian Gibb, Johannes Rainer, Laurent Gatto
#'
#' @examples
#'
#' ## Create a Spectra providing a `DataFrame` containing a MS data.
#'
#' spd <- DataFrame(msLevel = c(1L, 2L), rtime = c(1.1, 1.2))
#' spd$mz <- list(c(100, 103.2, 104.3, 106.5), c(45.6, 120.4, 190.2))
#' spd$intensity <- list(c(200, 400, 34.2, 17), c(12.3, 15.2, 6.8))
#'
#' s <- Spectra(spd)
#' s
#'
#' ## Create a second Spectra from mzML files and use the `MsBackendMzR`
#' ## on-disk backend.
#' sciex_file <- dir(system.file("sciex", package = "msdata"),
#'     full.names = TRUE)
#' sciex <- Spectra(sciex_file, backend = MsBackendMzR())
#' sciex
#'
#' ## Subset to the first 100 spectra to reduce running time of the examples
#' sciex <- sciex[1:100]
#'
#'
#' ##  --------  COMBINE SPECTRA  --------
#'
#' ## Combining the `Spectra` object `s` with the MS data from `sciex`.
#' ## Calling directly `c(s, sciex)` would result in an error because
#' ## both backends use a different backend. We thus have to first change
#' ## the backends to the same backend. We change the backend of the `sciex`
#' ## `Spectra` to a `MsBackendMemory`, the backend used by `s`.
#'
#' sciex <- setBackend(sciex, MsBackendMemory())
#'
#' ## Combine the two `Spectra`
#' all <- c(s, sciex)
#' all
#'
#' ## The new `Spectra` objects contains the union of spectra variables from
#' ## both:
#' spectraVariables(all)
#'
#' ## The spectra variables that were not present in `s`:
#' setdiff(spectraVariables(all), spectraVariables(s))
#'
#' ## The values for these were filled with missing values for spectra from
#' ## `s`:
#' all$peaksCount |> head()
#'
#'
#' ##  --------  AGGREGATE SPECTRA  --------
#'
#' ## Sets of spectra can be combined into a single, representative spectrum
#' ## per set using `combineSpectra()`. This aggregates the peaks data (i.e.
#' ## the spectra's m/z and intensity values) while using the values for all
#' ## spectra variables from the first spectrum per set. Below we define the
#' ## sets as all spectra measured in the *same second*, i.e. rounding their
#' ## retention time to the next closer integer value.
#' f <- round(rtime(sciex))
#' head(f)
#'
#' cmp <- combineSpectra(sciex, f = f)
#'
#' ## The length of `cmp` is now equal to the length of unique levels in `f`:
#' length(cmp)
#'
#' ## The spectra variable value from the first spectrum per set is used in
#' ## the representative/combined spectrum:
#' cmp$rtime
#'
#' ## The peaks data was aggregated: the number of mass peaks of the first six
#' ## spectra from the original `Spectra`:
#' lengths(sciex) |> head()
#'
#' ## and for the first aggreagated spectra:
#' lengths(cmp) |> head()
#'
#' ## The default peaks data aggregation method joins all mass peaks. See
#' ## documentation of the `combinePeaksData()` function for more options.
#'
#'
#' ##  --------  SPLITTING DATA  --------
#'
#' ## A `Spectra` can be split into a `list` of `Spectra` objects using the
#' ## `split()` function defining the sets into which the `Spectra` should
#' ## be splitted into with parameter `f`.
#' sciex_split <- split(sciex, f)
#'
#' length(sciex_split)
#' sciex_split |> head()
#'
#'
#' ##  --------  ADDING SPECTRA DATA  --------
#'
#' ## Adding new spectra variables
#' sciex1 <- filterDataOrigin(sciex, dataOrigin(sciex)[1])
#' spv <- DataFrame(spectrumId = sciex1$spectrumId[3:12], ## used for merging
#'                  var1 = rnorm(10),
#'                  var2 = sample(letters, 10))
#' spv
#'
#' sciex2 <- joinSpectraData(sciex1, spv, by.y = "spectrumId")
#'
#' spectraVariables(sciex2)
#' spectraData(sciex2)[1:13, c("spectrumId", "var1", "var2")]
#'
#' ## Append new spectra variables with cbind2()
#' df <- data.frame(cola = seq_len(length(sciex1)), colb = "b")
#' data_append <- cbind2(sciex1, df)
NULL

#' @rdname combineSpectra
#'
#' @exportMethod c
setMethod("c", "Spectra", function(x, ...) {
    .concatenate_spectra(unname(list(unname(x), ...)))
})

#' @rdname combineSpectra
#'
#' @export
setMethod("cbind2", signature(x = "Spectra",
                              y = "dataframeOrDataFrameOrmatrix"),
          function(x, y, ...) {
              x@backend <- cbind2(x@backend, y, ...)
          })

#' @rdname combineSpectra
setMethod("split", "Spectra", function(x, f, drop = FALSE, ...) {
    bcknds <- split(x@backend, f, ...)
    lapply(bcknds, function(b) {
        slot(x, "backend", check = FALSE) <- b
        x
    })
})


################################################################################
##
## Aggregating peaks data
##
################################################################################

#' @title Aggregating and combining mass peaks data
#'
#' @name combinePeaks
#'
#' @description
#'
#' In addition to aggregating content of spectra variables (describe in
#' [combineSpectra()]) it is also possible to aggregate and combine mass peaks
#' data from individual spectra within a `Spectra`. These `combinePeaks()`
#' function combines mass peaks **within each spectrum** with a difference in
#' their m/z values that is smaller than the maximal acceptable difference
#' defined by `ppm` and `tolerance`. Parameters `intensityFun` and `mzFun`
#' allow to define functions to aggregate the intensity and m/z values for
#' each such group of peaks. With `weighted = TRUE` (the default), the m/z
#' value of the combined peak is calculated using an intensity-weighted mean
#' and parameter `mzFun` is ignored. The [MsCoreUtils::group()] function is
#' used for the grouping of mass peaks. Parameter `msLevel.` allows to define
#' selected MS levels for which peaks should be combined. This function
#' returns a `Spectra` with the same number of spectra than the input object,
#' but with possibly combined peaks within each spectrum.
#' Additional peak variables (other than `"mz"` and `"intensity"`) are
#' dropped (i.e. their values are replaced with `NA`) for combined peaks
#' unless they are constant across the combined peaks. See also
#' [reduceSpectra()] for a function to select a single *representative*
#' mass peak for each peak group.
#'
#' @param intensityFun Function to aggregate intensities for all peaks in
#'     each peak group into a single intensity value.
#'
#' @param msLevel. `integer` defining the MS level(s) of the spectra to which
#'     the function should be applied (defaults to all MS levels of `object`.
#'
#' @param mzFun Function to aggregate m/z values for all mass peaks within
#'     each peak group into a single m/z value. This parameter is ignored if
#'     `weighted = TRUE` (the default).
#'
#' @param object A `Spectra` object.
#'
#' @param ppm `numeric(1)` defining a relative, m/z-dependent, maximal
#'     accepted difference between m/z values for peaks to be grouped. Default
#'     is `ppm = 20`.
#'
#' @param tolerance `numeric(1)` allowing to define a constant maximal
#'     accepted difference between m/z values for peaks to be grouped. Default
#'     is `tolerance = 0`.
#'
#' @param weighted `logical(1)` whether m/z values of peaks within each peak
#'     group should be aggregated into a single m/z value using an
#'     intensity-weighted mean. Defaults to `weighted = TRUE`.
#'
#' @param ... ignored.
#'
#' @md
#'
#' @seealso
#'
#' - [combineSpectra()] for functions to combine or aggregate `Spectra`'s
#'   spectra data.
#'
#' - [combinePeaksData()] for the function to combine the mass peaks data.
#'
#' - [reduceSpectra()] and similar functions to filter mass peaks data.
#'
#' - [Spectra] for a general description of the `Spectra` object.
#'
#' @author Sebastian Gibb, Johannes Rainer, Laurent Gatto
#'
#' @examples
#'
#' ## Create a Spectra from mzML files and use the `MsBackendMzR` on-disk
#' ## backend.
#' sciex_file <- dir(system.file("sciex", package = "msdata"),
#'     full.names = TRUE)
#' sciex <- Spectra(sciex_file, backend = MsBackendMzR())
#'
#' ## Combine mass peaks per spectrum with a difference in their m/z value
#' ## that is smaller than 20 ppm. The intensity values of such peaks are
#' ## combined by summing their values, while for the m/z values the median
#' ## is reported
#' sciex_comb <- combinePeaks(sciex, ppm = 20,
#'     intensityFun = sum, mzFun = median)
#'
#' ## Comparing the number of mass peaks before and after aggregation
#' lengths(sciex) |> head()
#' lengths(sciex_comb) |> head()
#'
#' ## Plotting the first spectrum before and after aggregation
#' par(mfrow = c(1, 2))
#' plotSpectra(sciex[2L])
#' plotSpectra(sciex_comb[2L])
#'
#' ## Using `reduceSpectra()` to keep for each group of mass peaks with a
#' ## difference in their m/z values < 20ppm the one with the highest intensity.
#' sciex_red <- reduceSpectra(sciex, ppm = 20)
#'
#' ## Comparing the number of mass peaks before and after the operation
#' lengths(sciex) |> head()
#' lengths(sciex_red) |> head()
NULL

#' @rdname hidden_aliases
setMethod("combinePeaks", "list", function(object, ...) {
    .Deprecated("combinePeaksData", old = "combinePeaks",
                msg = paste0("'combinePeaks' for lists of peak matrices is ",
                             "deprecated; please use 'combinePeaksData' ",
                             "instead."))
    combinePeaksData(object, ...)
})

#' @rdname combinePeaks
#'
#' @exportMethod combinePeaks
setMethod("combinePeaks", "Spectra", function(object, tolerance = 0, ppm = 20,
                                              intensityFun = base::mean,
                                              mzFun = base::mean,
                                              weighted = TRUE,
                                              msLevel. = uniqueMsLevels(object),
                                              ...) {
    object <- addProcessing(
        object, .peaks_combine, ppm = ppm, tolerance = tolerance,
        intensityFun = intensityFun, mzFun = mzFun, weighted = weighted,
        msLevel = force(msLevel.), spectraVariables = "msLevel")
    object@processing <- .logging(
        object@processing, "Combining peaks within each spectrum with ppm = ",
        ppm, " and tolerance = ", tolerance, ".")
    object
})


################################################################################
##
## Filtering, subsetting Spectra: subsetting Spectra and its data content.
##
################################################################################

#' @title Filter and subset Spectra objects
#'
#' @name filterMsLevel
#'
#' @aliases [,Spectra-method
#' @aliases filterAcquisitionNum
#' @aliases filterDataOrigin
#' @aliases filterDataStorage
#' @aliases filterEmptySpectra
#' @aliases filterIsolationWindow
#' @aliases filterMsLevel
#' @aliases filterPolarity
#' @aliases filterPrecursorCharge
#' @aliases filterPrecursorIsotopes
#' @aliases filterPrecursorMzRange
#' @aliases filterPrecursorMzValues
#' @aliases filterPrecursorScan
#' @aliases filterRanges
#' @aliases filterRt
#' @aliases filterValues
#' @aliases dropNaSpectraVariables
#' @aliases selectSpectraVariables
#' @aliases filterIntensity
#' @aliases filterMzRange
#' @aliases filterMzValues
#' @aliases reduceSpectra
#'
#' @description
#'
#' A variety of functions to filter or subset `Spectra` objects are available.
#' These can be generally separated into two main classes: I) *classical*
#' subset operations that immediately reduce the number of spectra in the
#' object and II) filters that reduce the **content** of the object without
#' changing its length (i.e. the number of spectra). The latter can be further
#' subdivided into functions that affect the content of the `spectraData` (i.e.
#' the general spectrum metadata) and those that reduce the content of the
#' object's `peaksData` (i.e. the m/z and intensity values of a spectrum's
#' mass peaks).
#'
#' A description of functions from these 3 different categories are given below
#' in sections *Subset `Spectra`*, *Filter content of `spectraData()`* and
#' *Filter content of `peaksData()`*, respectively.
#'
#'
#' @section Subset `Spectra`:
#'
#' These functions affect the number of spectra in a `Spectra` object creating
#' a subset of the original object without affecting its content.
#'
#' - `[`: subsets the spectra keeping only selected elements (`i`). The method
#'   **always** returns a `Spectra` object.
#'
#' - `filterAcquisitionNum()`: filters the object keeping only spectra matching
#'   the provided acquisition numbers (argument `n`). If `dataOrigin` or
#'   `dataStorage` is also provided, `object` is subsetted to the spectra with
#'   an acquisition number equal to `n` **in spectra with matching dataOrigin
#'   or dataStorage values** retaining all other spectra.
#'   Returns the filtered `Spectra`.
#'
#' - `filterDataOrigin()`: filters the object retaining spectra matching the
#'   provided `dataOrigin`. Parameter `dataOrigin` has to be of type
#'   `character` and needs to match exactly the data origin value of the
#'   spectra to subset.
#'   Returns the filtered `Spectra` object (with spectra ordered according to
#'   the provided `dataOrigin` parameter).
#'
#' - `filterDataStorage()`: filters the object retaining spectra stored in the
#'   specified `dataStorage`. Parameter `dataStorage` has to be of type
#'   `character` and needs to match exactly the data storage value of the
#'   spectra to subset.
#'   Returns the filtered `Spectra` object (with spectra ordered according to
#'   the provided `dataStorage` parameter).
#'
#' - `filterEmptySpectra()`: removes empty spectra (i.e. spectra without peaks).
#'   Returns the filtered `Spectra` object (with spectra in their
#'   original order).
#'
#' - `filterIsolationWindow()`: retains spectra that contain `mz` in their
#'   isolation window m/z range (i.e. with an `isolationWindowLowerMz` <= `mz`
#'   and `isolationWindowUpperMz` >= `mz`. Returns the filtered `Spectra`
#'   object (with spectra in their original order).
#'
#' - `filterMsLevel()`: filters object by MS level keeping only spectra matching
#'   the MS level specified with argument `msLevel`. Returns the filtered
#'   `Spectra` (with spectra in their original order).
#'
#' - `filterPolarity()`: filters the object keeping only spectra matching the
#'   provided polarity. Returns the filtered `Spectra` (with spectra in their
#'   original order).
#'
#' - `filterPrecursorCharge()`: retains spectra with the defined precursor
#'   charge(s).
#'
#' - `filterPrecursorIsotopes()`: groups MS2 spectra based on their precursor
#'   m/z and precursor intensity into predicted isotope groups and keep for each
#'   only the spectrum representing the monoisotopic precursor. MS1 spectra
#'   are returned as is. See documentation for `deisotopeSpectra()` below for
#'   details on isotope prediction and parameter description.
#'
#' - `filterPrecursorMaxIntensity()`: filters the `Spectra` keeping for groups
#'   of (MS2) spectra with similar precursor m/z values (given parameters
#'   `ppm` and `tolerance`) the one with the highest precursor intensity. The
#'   function filters only MS2 spectra and returns all MS1 spectra. If
#'   precursor intensities are `NA` for all spectra within a spectra group, the
#'   first spectrum of that groups is returned.
#'   Note: some manufacturers don't provide precursor intensities. These can
#'   however also be estimated with [estimatePrecursorIntensity()].
#'
#' - `filterPrecursorMzRange()` (previously `filterPrecursorMz()` which is now
#'   deprecated): retains spectra with a precursor m/z within the
#'   provided m/z range. See examples for details on selecting spectra with
#'   a precursor m/z for a target m/z accepting a small difference in *ppm*.
#'
#' - `filterPrecursorMzValues()`: retains spectra with precursor m/z matching
#'   any of the provided m/z values (given `ppm` and `tolerance`). Spectra with
#'   missing precursor m/z value (e.g. MS1 spectra) are dropped.
#'
#' - `filterPrecursorScan()`: retains parent (e.g. MS1) and children scans (e.g.
#'   MS2) of acquisition number `acquisitionNum`. Returns the filtered
#'   `Spectra` (with spectra in their original order). Parameter `f` allows to
#'   define which spectra belong to the same sample or original data file (
#'   defaults to `f = dataOrigin(object)`).
#'
#' - `filterRanges()`: allows filtering of the `Spectra` object based on user
#'   defined *numeric* ranges (parameter `ranges`) for one or more available
#'   spectra variables in object (spectra variable names can be specified with
#'   parameter `spectraVariables`). Spectra for which the value of a spectra
#'   variable is within it's defined range are retained. If multiple
#'   ranges/spectra variables are defined, the `match` parameter can be used
#'   to specify whether all conditions (`match = "all"`; the default) or if
#'   any of the conditions must match (`match = "any"`; all spectra for which
#'   values are within any of the provided ranges are retained).
#'
#' - `filterRt()`: retains spectra of MS level `msLevel` with retention
#'   times (in seconds) within (`>=`) `rt[1]` and (`<=`)
#'   `rt[2]`. Returns the filtered `Spectra` (with spectra in their
#'   original order).
#'
#' - `filterValues()`: allows filtering of the `Spectra` object based on
#'   similarities of *numeric* values of one or more `spectraVariables(object)`
#'   (parameter `spectraVariables`) to provided values (parameter `values`)
#'   given acceptable differences (parameters tolerance and ppm). If multiple
#'   values/spectra variables are defined, the `match` parameter can be used
#'   to specify whether all conditions (`match = "all"`; the default) or if
#'   any of the conditions must match (`match = "any"`; all spectra for which
#'   values are within any of the provided ranges are retained).
#'
#'
#' @section Filter content of `spectraData()`:
#'
#' The functions described in this section filter the content from a
#' `Spectra`'s spectra data, i.e. affect values of, or complete, spectra
#' variables. None of these functions reduces the object's number of spectra.
#'
#' - `dropNaSpectraVariables()`: removes spectra variables (i.e. columns in the
#'   object's `spectraData` that contain only missing values (`NA`). Note that
#'   while columns with only `NA`s are removed, a `spectraData()` call after
#'   `dropNaSpectraVariables()` might still show columns containing `NA` values
#'   for *core* spectra variables. The total number of spectra is not changed
#'   by this function.
#'
#' - `selectSpectraVariables()`: reduces the information within the object to
#'   the selected spectra variables: all data for variables not specified will
#'   be dropped. For mandatory columns (i.e., those listed by
#'   [coreSpectraVariables()], such as *msLevel*, *rtime* ...) only
#'   the values will be dropped but not the variable itself. Additional (or
#'   user defined) spectra variables will be completely removed.
#'   Returns the filtered `Spectra`.
#'
#'
#' @section Filter content of `peaksData()`:
#'
#' The functions described in this section filter the content of the
#' `Spectra`'s peaks data, i.e. either the number or the values (*m/z* or
#' intensity values) of the mass peaks. Also, the actual operation is only
#' executed once peaks data is accessed (through `peaksData()`,
#' `mz()` or `intensity()`) or `applyProcessing()` is called.
#' These operations don't affect the number of spectra in the `Spectra` object.
#'
#' - `deisotopeSpectra()`: *deisotopes* each spectrum keeping only the
#'   monoisotopic peak for groups of isotopologues. Isotopologues are
#'   estimated using the [MetaboCoreUtils::isotopologues()] function from the
#'   *MetaboCoreUtils* package. Note that
#'   the default parameters for isotope prediction/detection have been
#'   determined using data from the Human Metabolome Database (HMDB) and
#'   isotopes for elements other than CHNOPS might not be detected. See
#'   parameter `substDefinition` in the documentation of
#'   [MetaboCoreUtils::isotopologues()] for
#'   more information. The approach and code to define the parameters for
#'   isotope prediction is described
#'   [here](https://github.com/EuracBiomedicalResearch/isotopologues).
#'
#' - `filterFourierTransformArtefacts()`: removes (Orbitrap) fast fourier
#'   artefact peaks from spectra (see examples below). The function iterates
#'   through all intensity ordered peaks in a spectrum and removes all peaks
#'   with an m/z within +/- `halfWindowSize` of the current peak if their
#'   intensity is lower than `threshold` times the current peak's intensity.
#'   Additional parameters `keepIsotopes`, `maxCharge` and `isotopeTolerance`
#'   allow to avoid removing of potential `[13]C` isotope peaks (`maxCharge`
#'   being the maximum charge that should be considered and `isotopeTolerance`
#'   the absolute acceptable tolerance for matching their m/z).
#'   See [filterFourierTransformArtefacts()] for details and background and
#'   `deisitopeSpectra()` for an alternative.
#'
#' - `filterIntensity()`: filters mass peaks in each spectrum keeping only
#'   those with intensities that are within the provided range or match the
#'   criteria of the provided function. For the former, parameter `intensity`
#'   has to be a `numeric` defining the intensity range, for the latter a
#'   `function` that takes the intensity values of the spectrum and returns
#'   a `logical` whether the peak should be retained or not (see examples
#'   below for details) - additional parameters to the function can be passed
#'   with `...`.
#'   To remove only peaks with intensities below a certain threshold, say
#'   100, use `intensity = c(100, Inf)`. Note: also a single value can be
#'   passed with the `intensity` parameter in which case an upper limit of
#'   `Inf` is used.
#'   Note that this function removes also peaks with missing intensities
#'   (i.e. an intensity of `NA`). Parameter `msLevel.` allows to restrict the
#'   filtering to spectra of the specified MS level(s).
#'
#' - `filterMzRange()`: filters mass peaks in the object keeping or removing
#'   those in each spectrum that are within the provided m/z range. Whether
#'   peaks are retained or removed can be configured with parameter `keep`
#'   (default `keep = TRUE`).
#'
#' - `filterMzValues()`: filters mass peaks in the object keeping all
#'   peaks in each spectrum that match the provided m/z value(s) (for
#'   `keep = TRUE`, the default) or removing all of them (for `keep = FALSE`).
#'   The m/z matching considers also the absolute `tolerance` and m/z-relative
#'   `ppm` values. `tolerance` and `ppm` have to be of length 1.
#'
#' - `filterPeaksRanges()`: filters mass peaks of a `Spectra` object using any
#'   set of range-based filters on numeric spectra or peaks variables. See
#'   [filterPeaksRanges()] for more information.
#'
#' - `filterPrecursorPeaks()`: removes peaks from each spectrum in `object` with
#'   an m/z equal or larger than the m/z of the precursor, depending on the
#'   value of parameter `mz`: for `mz = ==" (the default) peaks with matching
#'   m/z (considering an absolute and relative acceptable difference depending
#'   on `tolerance` and `ppm`, respectively) are removed. For `mz = ">="` all
#'   peaks with an m/z larger or equal to the precursor m/z (minus `tolerance`
#'   and the `ppm` of the precursor m/z) are removed. Parameter `msLevel.`
#'   allows to restrict the filter to certain MS levels (by default the filter
#'   is applied to all MS levels). Note that no peaks are removed if the
#'   precursor m/z is `NA` (e.g. typically for MS1 spectra).
#'
#' - `reduceSpectra()`: keeps for groups of peaks with similar m/z values in
#'   (given `ppm` and `tolerance`) in each spectrum only the mass peak with the
#'   highest intensity removing all other peaks hence *reducing* each
#'   spectrum to the highest intensity peaks per *peak group*.
#'   Peak groups are defined using the [group()] function from the
#'   *MsCoreUtils* package. See also the [combinePeaks()] function for an
#'   alternative function to combine peaks within each spectrum.
#'
#' @param acquisitionNum for `filterPrecursorScan()`: `integer` with the
#'     acquisition number of the spectra to which the object should be
#'     subsetted.
#'
#' @param charge For `deisotopeSpectra()`: expected charge of the ionized
#'     compounds. See [MetaboCoreUtils::isotopologues()] for details.
#'
#' @param dataOrigin For `filterDataOrigin()`: `character` to define which
#'     spectra to keep.
#'     For `filterAcquisitionNum()`: optionally specify if filtering should
#'     occurr only for spectra of selected `dataOrigin`.
#'
#' @param dataStorage For `filterDataStorage()`: `character` to define which
#'     spectra to keep.
#'     For `filterAcquisitionNum()`: optionally specify if filtering should
#'     occur only for spectra of selected `dataStorage`.
#'
#' @param drop For `[`: not considered.
#'
#' @param f For `filterPrecursorScan()`: defining which spectra
#'     belong to the same original data file (sample): Defaults to
#'     `f = dataOrigin(x)`.
#'
#' @param halfWindowSize For `filterFourierTransformArtefacts()`: `numeric(1)`
#'     defining the m/z window left and right of a peak where to remove
#'     fourier transform artefacts.
#'
#' @param i For `[`: `integer`, `logical` or `character` to subset the
#'     object.
#'
#' @param intensity For `filterIntensity()`: `numeric` of length 1 or 2
#'     defining either the lower or the lower and upper intensity limit for the
#'     filtering, or a `function` that takes the intensities as input and
#'     returns a `logical` (same length then peaks in the spectrum) whether the
#'     peak should be retained or not. Defaults to `intensity = c(0, Inf)` thus
#'     only peaks with `NA` intensity are removed.
#'
#' @param isotopeTolerance For `filterFourierTransformArtefacts()`: the m/z
#'     `tolerance` to be used to define whether peaks might be isotopes of
#'     the current tested peak.
#'
#' @param j For `[`: not supported.
#'
#' @param keep For `filterMzValues()` and `filterMzRange()`: `logical(1)`
#'     whether  the matching peaks should be retained (`keep = TRUE`, the
#'     default) or dropped (`keep = FALSE`).
#'
#' @param keepIsotopes For `filterFourierTransformArtefacts()`: whether isotope
#'     peaks should not be removed as fourier artefacts.
#'
#' @param match For `filterRanges()` and `filterValues()`: `character(1) `
#'     defining whether the condition has to match for all provided
#'     `ranges`/`values` (`match = "all"`; the default), or for any of them
#'     (`match = "any"`) for spectra to be retained.
#'
#' @param maxCharge For `filterFourierTransformArtefacts()`: the maximum charge
#'     to be considered for isotopes.
#'
#' @param msLevel. `integer` defining the MS level(s) of the spectra to which
#'     the function should be applied (defaults to all MS levels of `object`.
#'     For `filterMsLevel()`: the MS level to which `object` should be
#'     subsetted.
#'
#' @param mz For `filterIsolationWindow()`: `numeric(1)` with the m/z value to
#'     filter the object. For `filterPrecursorMz()` and `filterMzRange()`:
#'     `numeric(2)` defining the lower and upper m/z boundary.
#'     For `filterMzValues()` and `filterPrecursorMzValues()`: `numeric` with
#'     the m/z values to match peaks or precursor m/z against.
#'     For `filterPrecursorPeaks()`: `character(1)` defining whether mass peaks
#'     with an m/z matching the spectrum's precursor m/z (`mz = "=="`,
#'     the default) or mass peaks with a m/z that is equal or larger
#'     (`mz = ">="`) should be removed.
#'
#' @param n for `filterAcquisitionNum()`: `integer` with the acquisition
#'     numbers to filter for.
#'
#' @param object `Spectra` object.
#'
#' @param polarity for `filterPolarity()`: `integer` specifying the polarity to
#'     to subset `object`.
#'
#' @param ppm For `filterMzValues()` and `reduceSpectra()`: `numeric(1)`
#'     defining a relative, m/z-dependent, maximal accepted difference between
#'     m/z values for peaks to be matched (or grouped).
#'     For `filterPrecursorMaxIntensity()`: `numeric(1)` defining the relative
#'     maximal accepted difference of precursor m/z values of spectra for
#'     grouping them into *precursor groups*. For `filterPrecursorIsotopes()`:
#'     passed directly to the [MetaboCoreUtils::isotopologues()] function.
#'     For `filterValues()`: `numeric` of any length allowing to define
#'     a maximal accepted difference between user input `values` and the
#'     `spectraVariables` values. If it is not equal to the length of the
#'     value provided with parameter `spectraVariables`, `ppm[1]` will be
#'     recycled.
#'
#' @param ranges for `filterRanges()`: A `numeric` vector of paired values
#'     (upper and lower boundary) that define the ranges to filter the `object`.
#'     These paired values need to be in the same order as the
#'     `spectraVariables` parameter (see below).
#'
#' @param rt for `filterRt()`: `numeric(2)` defining the retention time range to
#'     be used to subset/filter `object`.
#'
#' @param spectraVariables For `selectSpectraVariables()`: `character` with the
#'     names of the spectra variables to which the backend should be
#'     subsetted. For `filterRanges()` and `filterValues()`: `character`
#'     vector specifying the column(s) from `spectraData(object)` on which
#'     to filter the data and that correspond to the the names of the
#'     spectra variables that should be used for the filtering.
#'
#' @param substDefinition For `deisotopeSpectra()` and
#'     `filterPrecursorIsotopes()`: `matrix` or `data.frame` with definitions
#'     of isotopic substitutions. Uses by default isotopic substitutions
#'     defined from all compounds in the Human Metabolome Database (HMDB). See
#'     [MetaboCoreUtils::isotopologues()] or
#'     [MetaboCoreUtils::isotopicSubstitutionMatrix()] in the
#'     *MetaboCoreUtils* for details.
#'
#' @param threshold For `filterFourierTransformArtefacts()`: the relative
#'     intensity (to a peak) below which peaks are considered fourier
#'     artefacts. Defaults to `threshold = 0.2` hence removing peaks that
#'     have an intensity below 0.2 times the intensity of the tested peak
#'     (within the selected `halfWindowSize`).
#'
#' @param tolerance For `filterMzValues()` and `reduceSpectra()`:
#'     `numeric(1)` allowing to define a constant maximal accepted difference
#'     between m/z values for peaks to be matched (or grouped). For
#'     `containsMz()` it can also be of length equal `mz` to specify a different
#'     tolerance for each m/z value.
#'     For `filterPrecursorMaxIntensity()`: `numeric(1)` defining the
#'     (constant) maximal accepted difference of precursor m/z values of
#'     spectra for grouping them into *precursor groups*. For
#'     `filterPrecursorIsotopes()`: passed directly to the
#'     [MetaboCoreUtils::isotopologues()]
#'     function. For `filterValues()`: `numeric` of any length allowing to
#'     define a maximal accepted difference between user input `values` and the
#'     `spectraVariables` values. If it is not equal to the length of the
#'     value provided with parameter `spectraVariables`, `tolerance[1]` will be
#'     recycled. Default is `tolerance = 0`.
#'
#' @param values for `filterValues()`: A `numeric` vector that define the
#'     values to filter the Spectra data. These values need to be in the same
#'     order as the `spectraVariables` parameter.
#'
#' @param x `Spectra` object.
#'
#' @param z For `filterPrecursorCharge()`: `integer()` with the precursor
#'     charges to be used as filter.
#'
#' @param ... Additional arguments.
#'
#' @seealso
#'
#' - [combineSpectra()] for functions to combine or aggregate `Spectra`.
#'
#' - [combinePeaks()] for functions to combine or aggregate a `Spectra`'s
#'   `peaksData()`
#'
#' @md
#'
#' @author Sebastian Gibb, Johannes Rainer, Laurent Gatto, Philippine Louail, Nir Shahaf
#'
#' @examples
#'
#' ## Load a `Spectra` object with LC-MS/MS data.
#' fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
#'     package = "msdata")
#' sps_dda <- Spectra(fl)
#' sps_dda
#'
#'
#' ##  --------  SUBSET SPECTRA  --------
#'
#' ## Subset to the first 3 spectra
#' tmp <- sps_dda[1:3]
#' tmp
#' length(tmp)
#'
#' ## Subset to all MS2 spectra; this could be done with [, or, more
#' ## efficiently, with the `filterMsLevel` function:
#' sps_dda[msLevel(sps_dda) == 2L]
#' filterMsLevel(sps_dda, 2L)
#'
#' ## Filter the object keeping only MS2 spectra with an precursor m/z value
#' ## between a specified range:
#' filterPrecursorMzRange(sps_dda, c(80, 90))
#'
#' ## Filter the object to MS2 spectra with an precursor m/z matching a
#' ## pre-defined value (given ppm and tolerance)
#' filterPrecursorMzValues(sps_dda, 85, ppm = 5, tolerance = 0.1)
#'
#' ## The `filterRanges()` function allows to filter a `Spectra` based on
#' ## numerical ranges of any of its (numerical) spectra variables.
#' ## First, determine the variable(s) on which to base the filtering:
#' sv <- c("rtime", "precursorMz", "peaksCount")
#' ## Note that ANY variables can be chosen here, and as many as wanted.
#'
#' ## Define the ranges (pairs of values with lower and upper boundary) to be
#' ## used for the individual spectra variables. The first two values will be
#' ## used for the first spectra variable (e.g., `"rtime"` here), the next two
#' ## for the second (e.g. `"precursorMz"` here) and so on:
#' ranges <- c(30, 350, 200, 500, 350, 600)
#'
#' ## Input the parameters within the filterRanges function:
#' filt_spectra <- filterRanges(sps_dda, spectraVariables = sv,
#'                 ranges = ranges)
#' filt_spectra
#'
#' ## `filterRanges()` can also be used to filter a `Spectra` object with
#' ## multiple ranges for the same `spectraVariable` (e.g, here `"rtime"`)
#' sv <- c("rtime", "rtime")
#' ranges <- c(30, 100, 200, 300)
#' filt_spectra <- filterRanges(sps_dda, spectraVariables = sv,
#'                 ranges = ranges, match = "any")
#' filt_spectra
#'
#' ## While `filterRanges()` filtered on numeric ranges, `filterValues()`
#' ## allows to filter an object matching spectra variable values to user
#' ## provided values (allowing to configure allowed differences using the
#' ## `ppm` and `tolerance` parameters).
#' ## First determine the variable(s) on which to base the filtering:
#' sv <- c("rtime", "precursorMz")
#' ## Note that ANY variables can be chosen here, and as many as wanted.
#'
#' ## Define the values that will be used to filter the spectra based on their
#' ## similarities to their respective `spectraVariables`.
#' ## The first values in the parameters values, tolerance and ppm will be
#' ## used for the first spectra variable (e.g. `"rtime"` here), the next for
#' ## the second (e.g. `"precursorMz"` here) and so on:
#' values <- c(350, 80)
#' tolerance <- c(100, 0.1)
#' ppm <- c(0, 50)
#'
#' ## Input the parameters within the `filterValues()` function:
#' filt_spectra <- filterValues(sps_dda, spectraVariables = sv,
#'                 values = values, tolerance = tolerance, ppm = ppm)
#' filt_spectra
#'
#'
#' ##  --------  FILTER SPECTRA DATA  --------
#'
#' ## Remove spectra variables without content (i.e. with only missing values)
#' sps_noNA <- dropNaSpectraVariables(sps_dda)
#'
#' ## This reduced the size of the object slightly
#' print(object.size(sps_dda), unit = "MB")
#' print(object.size(sps_noNA), unit = "MB")
#'
#' ## With the `selectSpectraVariables()` function it is in addition possible
#' ## to subset the data of a `Spectra` to the selected columns/variables,
#' ## keeping only their data:
#' tmp <- selectSpectraVariables(sps_dda, c("msLevel", "mz", "intensity",
#'     "scanIndex"))
#' print(object.size(tmp), units = "MB")
#'
#' ## Except the selected variables, all data is now removed. Accessing
#' ## core spectra variables still works, but returns only NA
#' rtime(tmp) |> head()
#'
#'
#' ##  --------  FILTER PEAKS DATA  --------
#'
#' ## `filterMzValues()` filters the mass peaks data of a `Spectra` retaining
#' ## only those mass peaks with an m/z value matching the provided value(s).
#' sps_sub <- filterMzValues(sps_dda, mz = c(103, 104), tolerance = 0.3)
#'
#' ## The filtered `Spectra` has the same length
#' length(sps_dda)
#' length(sps_sub)
#'
#' ## But the number of mass peaks changed
#' lengths(sps_dda) |> head()
#' lengths(sps_sub) |> head()
#'
#' ## This function can also be used to remove specific peaks from a spectrum
#' ## by setting `keep = FALSE`.
#' sps_sub <- filterMzValues(sps_dda, mz = c(103, 104),
#'     tolerance = 0.3, keep = FALSE)
#' lengths(sps_sub) |> head()
#'
#' ## With the `filterMzRange()` function it is possible to keep (or remove)
#' ## mass peaks with m/z values within a specified numeric range.
#' sps_sub <- filterMzRange(sps_dda, mz = c(100, 150))
#' lengths(sps_sub) |> head()
#'
#' ## See also the `filterPeaksRanges()` function for a more flexible framework
#' ## to filter mass peaks
#'
#'
#' ## Removing fourier transform artefacts seen in Orbitra data.
#'
#' ## Loading an Orbitrap spectrum with artefacts.
#' data(fft_spectrum)
#' plotSpectra(fft_spectrum, xlim = c(264.5, 265.5))
#' plotSpectra(fft_spectrum, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
#'
#' fft_spectrum <- filterFourierTransformArtefacts(fft_spectrum)
#' fft_spectrum
#' plotSpectra(fft_spectrum, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
#'
#' ## Using a few examples peaks in your data you can optimize the parameters
#' fft_spectrum_filtered <- filterFourierTransformArtefacts(fft_spectrum,
#'                                                halfWindowSize = 0.2,
#'                                                threshold = 0.005,
#'                                                keepIsotopes = TRUE,
#'                                                maxCharge = 5,
#'                                                isotopeTolerance = 0.005
#'                                                )
#'
#' fft_spectrum_filtered
#' length(mz(fft_spectrum_filtered)[[1]])
#' plotSpectra(fft_spectrum_filtered, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
#'
#'
#' ## *Reducing* a `Spectra` keeping for groups of mass peaks (characterized
#' ## by similarity of their m/z values) only one representative peak. This
#' ## function helps cleaning fragment spectra.
#' ## Filter the data set to MS2 spectra
#' ms2 <- filterMsLevel(sps_dda, 2L)
#'
#' ## For groups of fragment peaks with a difference in m/z < 0.1, keep only
#' ## the largest one.
#' ms2_red <- reduceSpectra(ms2, ppm = 0, tolerance = 0.1)
#' lengths(ms2) |> tail()
#' lengths(ms2_red) |> tail()
NULL

#' @rdname filterMsLevel
setMethod("dropNaSpectraVariables", "Spectra", function(object) {
    object@backend <- dropNaSpectraVariables(object@backend)
    object
})

#' @rdname filterMsLevel
setMethod(
    "selectSpectraVariables", "Spectra",
    function(object, spectraVariables = union(spectraVariables(object),
                                              peaksVariables(object))) {
        spectraVariables <- union(spectraVariables, "dataStorage")
        object@backend <- selectSpectraVariables(
            object@backend, spectraVariables = spectraVariables)
        object
    })

#' @rdname filterMsLevel
#'
#' @export
setMethod("[", "Spectra", function(x, i, j, ..., drop = FALSE) {
    if (!missing(j))
        stop("Subsetting 'Spectra' by columns is not (yet) supported")
    if (missing(i))
        return(x)
    slot(x, "backend", check = FALSE) <- extractByIndex(
        x@backend, i2index(i, length(x)))
    x
})

#' @rdname filterMsLevel
setMethod("filterAcquisitionNum", "Spectra", function(object, n = integer(),
                                                      dataStorage = character(),
                                                      dataOrigin = character()) {
    if (length(dataStorage) && !is.character(dataStorage))
        stop("'dataStorage' is expected to be of type character")
    if (length(dataOrigin) && !is.character(dataOrigin))
        stop("'dataOrigin' is expected to be of type character")
    object@backend <- filterAcquisitionNum(object@backend, n,
                                           dataStorage, dataOrigin)
    object@processing <- .logging(object@processing,
                                  "Filter: select by: ", length(n),
                                  " acquisition number(s) in ",
                                  max(length(dataStorage), length(dataOrigin)),
                                  " file(s)")
    object
})

#' @rdname filterMsLevel
setMethod("filterEmptySpectra", "Spectra", function(object) {
    object@backend <- extractByIndex(object@backend,
                                     which(as.logical(lengths(object))))
    object@processing <- .logging(object@processing,
                                  "Filter: removed empty spectra.")
    object
})

#' @rdname filterMsLevel
setMethod("filterDataOrigin", "Spectra", function(object,
                                                  dataOrigin = character()) {
    if (length(dataOrigin) && !is.character(dataOrigin))
        stop("'dataOrigin' is expected to be of type character")
    object@backend <- filterDataOrigin(object@backend, dataOrigin = dataOrigin)
    object@processing <- .logging(object@processing,
                                  "Filter: select data origin(s) ",
                                  paste0(dataOrigin, collapse = ", "))
    object
})

#' @rdname filterMsLevel
setMethod("filterDataStorage", "Spectra", function(object,
                                                   dataStorage = character()) {
    if (length(dataStorage) && !is.character(dataStorage))
        stop("'dataStorage' is expected to be of type character")
    object@backend <- filterDataStorage(object@backend, dataStorage)
    object@processing <- .logging(object@processing,
                                  "Filter: select data storage(s) ",
                                  paste0(dataStorage, collapse = ", "))
    object
})

#' @rdname filterMsLevel
#'
#' @exportMethod filterFourierTransformArtefacts
setMethod("filterFourierTransformArtefacts", "Spectra",
          function(object, halfWindowSize = 0.05, threshold = 0.2,
                   keepIsotopes = TRUE, maxCharge = 5,
                   isotopeTolerance = 0.005) {
              object <- addProcessing(object, .peaks_remove_fft_artifact,
                                      halfWindowSize = halfWindowSize,
                                      threshold = threshold,
                                      keepIsotopes = keepIsotopes,
                                      maxCharge = maxCharge,
                                      isotopeTolerance = isotopeTolerance)
              object@processing <- .logging(
                  object@processing, "Remove fast fourier artefacts.")
              object
          })

#' @rdname filterMsLevel
#'
#' @importMethodsFrom ProtGenerics filterIntensity
#'
#' @exportMethod filterIntensity
setMethod("filterIntensity", "Spectra",
          function(object, intensity = c(0, Inf),
                   msLevel. = uniqueMsLevels(object), ...) {
              if (!.check_ms_level(object, msLevel.))
                  return(object)
              if (is.numeric(intensity)) {
                  if (length(intensity) == 1)
                      intensity <- c(intensity, Inf)
                  if (length(intensity) != 2)
                      stop("'intensity' should be of length specifying a ",
                           "lower intensity limit or of length two defining ",
                           "a lower and upper limit.")
                  object <- addProcessing(object, .peaks_filter_intensity,
                                          intensity = intensity,
                                          msLevel = msLevel.,
                                          spectraVariables = "msLevel")
                  object@processing <- .logging(
                      object@processing, "Remove peaks with intensities ",
                      "outside [", intensity[1], ", ", intensity[2],
                      "] in spectra of MS level(s) ",
                      paste0(msLevel., collapse = ", "), ".")
              } else {
                  if (is.function(intensity)) {
                      object <- addProcessing(
                          object, .peaks_filter_intensity_function,
                          intfun = intensity, msLevel = msLevel.,
                          args = list(...), spectraVariables = "msLevel")
                      object@processing <- .logging(
                          object@processing, "Remove peaks based on their ",
                          "intensities and a user-provided function ",
                          "in spectra of MS level(s) ",
                          paste0(msLevel., collapse = ", "), ".")
                  }
                  else stop("'intensity' has to be numeric or a function")
              }
              object
          })


#' @rdname filterMsLevel
setMethod("filterIsolationWindow", "Spectra", function(object, mz = numeric()) {
    object@backend <- filterIsolationWindow(object@backend, mz = mz)
    object@processing <- .logging(object@processing,
                                  "Filter: select spectra containing m/z ",
                                  mz, " in their isolation window")
    object
})

#' @rdname filterMsLevel
setMethod("filterMsLevel", "Spectra", function(object, msLevel. = integer()) {
    object@backend <- filterMsLevel(object@backend, msLevel = msLevel.)
    object@processing <- .logging(object@processing,
                                  "Filter: select MS level(s) ",
                                  paste0(unique(msLevel.), collapse = " "))
    object
})

#' @rdname filterMsLevel
#'
#' @importMethodsFrom ProtGenerics filterMzRange
#'
#' @export
setMethod("filterMzRange", "Spectra",
          function(object, mz = numeric(), msLevel. = uniqueMsLevels(object),
                   keep = TRUE) {
              if (!.check_ms_level(object, msLevel.))
                  return(object)
              if (!length(mz)) mz <- c(-Inf, Inf)
              else mz <- range(mz)
              object <- addProcessing(object, .peaks_filter_mz_range, mz = mz,
                                      msLevel = msLevel., keep = keep,
                                      spectraVariables = "msLevel")
              if (keep) keep_or_remove <- "select"
              else keep_or_remove <- "remove"
              object@processing <- .logging(
                  object@processing, "Filter: ", keep_or_remove,
                  " peaks with an m/z within [", mz[1L], ", ", mz[2L], "]")
              object
          })

#' @rdname filterMsLevel
#'
#' @importMethodsFrom ProtGenerics filterMzValues
#'
#' @export
setMethod("filterMzValues", "Spectra",
          function(object, mz = numeric(), tolerance = 0, ppm = 20,
                   msLevel. = uniqueMsLevels(object), keep = TRUE) {
              if (!.check_ms_level(object, msLevel.))
                  return(object)
              l <- length(mz)
              if (length(tolerance) != 1)
                  stop("'tolerance' should be of length 1")
              if (length(ppm) != 1)
                  stop("'ppm' should be of length 1")
              if (is.unsorted(mz)) {
                  idx <- order(mz)
                  mz <- mz[idx]
                  if (length(tolerance) == l)
                      tolerance <- tolerance[idx]
                  if (length(ppm) == l)
                      ppm <- ppm[idx]
              }
              object <- addProcessing(object, .peaks_filter_mz_value,
                                      mz = mz, tolerance = tolerance,
                                      ppm = ppm, msLevel = msLevel.,
                                      keep = keep, spectraVariables = "msLevel")
              if (length(mz) <= 3)
                  what <- paste0(format(mz, digits = 4), collapse = ", ")
              else what <- ""
              if (keep)
                  keep_or_remove <- "select"
              else keep_or_remove <- "remove"
              object@processing <- .logging(
                  object@processing, "Filter: ", keep_or_remove,
                  " peaks matching provided m/z values ", what)
              object
          })

#' @rdname filterMsLevel
setMethod("filterPolarity", "Spectra", function(object, polarity = integer()) {
    object@backend <- filterPolarity(object@backend, polarity = polarity)
    object@processing <- .logging(object@processing,
                                  "Filter: select spectra with polarity ",
                                  paste0(polarity, collapse = " "))
    object
})

#' @rdname filterMsLevel
#'
#' @export
setMethod("filterPrecursorMz", "Spectra",
          function(object, mz = numeric()) {
              .Deprecated(
                  msg = paste0("'filterPrecursorMz' is deprecated. Please use",
                               " 'filterPrecursorMzRange' instead."))
              object@backend <- filterPrecursorMzRange(object@backend, mz)
              object@processing <- .logging(
                  object@processing,
                  "Filter: select spectra with a precursor m/z within [",
                  paste0(mz, collapse = ", "), "]")
              object
          })

#' @rdname filterMsLevel
setMethod("filterPrecursorMzRange", "Spectra",
          function(object, mz = numeric()) {
              object@backend <- filterPrecursorMzRange(object@backend, mz)
              object@processing <- .logging(
                  object@processing,
                  "Filter: select spectra with a precursor m/z within [",
                  paste0(mz, collapse = ", "), "]")
              object
          })

#' @rdname filterMsLevel
setMethod("filterPrecursorMzValues", "Spectra",
          function(object, mz = numeric(), ppm = 20, tolerance = 0) {
              object@backend <- filterPrecursorMzValues(
                  object@backend, sort(mz), ppm = ppm, tolerance = tolerance)
              object@processing <- .logging(
                  object@processing,
                  "Filter: select spectra with precursor m/z matching ",
                  paste0(mz, collapse = ", "), "")
              object
          })

#' @rdname filterMsLevel
setMethod("filterPrecursorCharge", "Spectra",
          function(object, z = integer()) {
              z <- unique(z)
              object@backend <- filterPrecursorCharge(object@backend, z)
              object@processing <- .logging(
                  object@processing,
                  "Filter: select spectra with a precursor charge ",
                  paste0(z, collapse = ", "))
              object
          })

#' @rdname filterMsLevel
setMethod("filterPrecursorScan", "Spectra",
          function(object, acquisitionNum = integer(), f = dataOrigin(object)) {
              if (!all(f %in% unique(dataOrigin(object))))
                  stop("'f' must be in dataOrigin().")
              object@backend <- filterPrecursorScan(object@backend,
                                                    acquisitionNum,
                                                    f = dataOrigin(object))
              object@backend <- filterDataOrigin(object@backend, dataOrigin = f)
              object@processing <- .logging(
                  object@processing,
                  "Filter: select parent/children scans for ",
                  paste0(acquisitionNum, collapse = " "))
              object
          })

#' @rdname filterMsLevel
setMethod("filterRt", "Spectra",
          function(object, rt = numeric(), msLevel. = uniqueMsLevels(object)) {
              if (!is.numeric(msLevel.))
                  stop("Please provide a numeric MS level.")
              if (length(rt) != 2L || !is.numeric(rt) || rt[1] >= rt[2])
                  stop("Please provide a lower and upper numeric retention",
                       " time range.")
              if (length(rt))
                  rt <- range(rt)
              else rt <- c(-Inf, Inf)
              object@backend <- filterRt(object@backend, rt, msLevel.)
              object@processing <- .logging(
                  object@processing,
                  "Filter: select retention time [", rt[1], "..", rt[2],
                  "] on MS level(s) ", paste0(msLevel., collapse = " "))
              object
          })

#' @rdname filterMsLevel
setMethod("filterRanges", "Spectra",
          function(object, spectraVariables = character(), ranges = numeric(),
                   match = c("all", "any")){
              object@backend <- filterRanges(object@backend, spectraVariables,
                                             ranges, match)
              object@processing <- .logging(object@processing,
                                            "Filter: select spectra with a ",
                                            spectraVariables, " within: [",
                                            ranges[seq(ranges)%% 2 != 0], ", ",
                                            ranges[seq(ranges)%% 2 == 0], "]"
                                            )
              object
              })

#' @rdname filterMsLevel
setMethod("filterValues", "Spectra",
          function(object, spectraVariables = character(), values = numeric(),
                   ppm = 0, tolerance = 0, match = c("all", "any")){
              object@backend <- filterValues(object@backend, spectraVariables,
                                             values, ppm, tolerance, match)
              object@processing <- .logging(object@processing,
                                            "Filter: select spectra with a ",
                                            spectraVariables, " similar to: ",
                                            values)
              object
              })


################################################################################
##
## Data manipulation and analysis operations (lazy processing)
##
################################################################################

#' @title Data manipulation and analysis methods
#'
#' @name addProcessing
#'
#' @aliases addProcessing
#' @aliases applyProcessing
#' @aliases bin
#' @aliases containsMz
#' @aliases containsNeutralLoss
#' @aliases entropy
#' @aliases pickPeaks
#' @aliases replaceIntensitiesBelow
#' @aliases reset
#' @aliases smooth
#' @aliases spectrapply
#'
#' @description
#'
#' Various data analysis functions are available for `Spectra` objects. These
#' can be categorized into functions that either return a `Spectra` object
#' (with the manipulated data) and functions that directly return the
#' result from the calculation. For the former category, the data manipulations
#' are cached in the result object's *processing queue* and only exectuted
#' on-the-fly when the respective data gets extracted from the `Spectra` (see
#' section *The processing queue* for more information).
#'
#' For the second category, the calculations are directly executed and the
#' result, usually one value per spectrum, returned. Generally, to reduce
#' memory demand, a chunk-wise processing of the data is performed.
#'
#'
#' @section Data analysis methods returning a `Spectra`:
#'
#' The methods listed here return a `Spectra` object as a result.
#'
#' - `addProcessing()`: adds an arbitrary function that should be applied to the
#'   peaks matrix of every spectrum in `object`. The function (can be passed
#'   with parameter `FUN`) is expected to take a peaks matrix as input and to
#'   return a peaks matrix. A peaks matrix is a numeric matrix with two columns,
#'   the first containing the m/z values of the peaks and the second the
#'   corresponding intensities. The function has to have `...` in its
#'   definition. Additional arguments can be passed with `...`. With parameter
#'   `spectraVariables` it is possible to define additional spectra variables
#'   from `object` that should be passed to the function `FUN`. These will be
#'   passed by their name (e.g. specifying `spectraVariables = "precursorMz"`
#'   will pass the spectra's precursor m/z as a parameter named `precursorMz`
#'   to the function. The only exception is the spectra's MS level, these will
#'   be passed to the function as a parameter called `spectrumMsLevel` (i.e.
#'   with `spectraVariables = "msLevel"` the MS levels of each spectrum will be
#'   submitted to the function as a parameter called `spectrumMsLevel`).
#'   Examples are provided in the package vignette.
#'
#' - `bin()`: aggregates individual spectra into discrete (m/z) bins. Binning is
#'   performed only on spectra of the specified MS level(s) (parameter
#'   `msLevel`, by default all MS levels of `x`). The bins can be defined with
#'   parameter `breaks` which by default are equally sized bins, with size
#'   being defined by parameter `binSize`, from the minimal to the maximal m/z
#'   of all spectra (of MS level `msLevel`) within `x`. The same bins are used
#'   for all spectra in `x`. All intensity values for peaks falling into the
#'   same bin are aggregated using the function provided with parameter `FUN`
#'   (defaults to `FUN = sum`, i.e. all intensities are summed up). Note that
#'   the binning operation is applied to the peak data on-the-fly upon data
#'   access and it is possible to *revert* the operation with the `reset()`
#'   function (see description of `reset()` below).
#'
#' - `countIdentifications`: counts the number of identifications each scan has
#'   led to. See [countIdentifications()] for more details.
#'
#' - `pickPeaks()`: picks peaks on individual spectra using a moving
#'   window-based approach (window size = `2 * halfWindowSize`). For noisy
#'   spectra there are currently two different noise estimators available,
#'   the *M*edian *A*bsolute *D*eviation (`method = "MAD"`) and
#'   Friedman's Super Smoother (`method = "SuperSmoother"`),
#'   as implemented in the [`MsCoreUtils::noise()`].
#'   The method supports also to optionally *refine* the m/z value of
#'   the identified centroids by considering data points that belong (most
#'   likely) to the same mass peak. Therefore the m/z value is calculated as an
#'   intensity weighted average of the m/z values within the peak region.
#'   The peak region is defined as the m/z values (and their respective
#'   intensities) of the `2 * k` closest signals to the centroid or the closest
#'   valleys (`descending = TRUE`) in the `2 * k` region. For the latter the `k`
#'   has to be chosen general larger. See [`MsCoreUtils::refineCentroids()`] for
#'   details.
#'   If the ratio of the signal to the highest intensity of the peak is below
#'   `threshold` it will be ignored for the weighted average.
#'
#' - `replaceIntensitiesBelow()`: replaces intensities below a specified
#'   threshold with the provided `value`. Parameter `threshold` can be either
#'   a single numeric value or a function which is applied to all non-`NA`
#'   intensities of each spectrum to determine a threshold value for each
#'   spectrum. The default is `threshold = min` which replaces all values
#'   which are <= the minimum intensity in a spectrum with `value` (the
#'   default for `value` is `0`). Note that the function specified with
#'   `threshold` is expected to have a parameter `na.rm` since `na.rm = TRUE`
#'   will be passed to the function. If the spectrum is in profile mode,
#'   ranges of successive non-0 peaks <= `threshold` are set to 0.
#'   Parameter `msLevel.` allows to apply this to only spectra of certain MS
#'   level(s).
#'
#' - `scalePeaks()`: scales intensities of peaks within each spectrum depending
#'   on parameter `by`. With `by = sum` (the default) peak intensities are
#'   divided by the sum of peak intensities within each spectrum. The sum of
#'   intensities is thus 1 for each spectrum after scaling. Parameter
#'   `msLevel.` allows to apply the scaling of spectra of a certain MS level.
#'   By default (`msLevel. = uniqueMsLevels(x)`) intensities for all
#'   spectra will be scaled.
#'
#' - `smooth()`: smooths individual spectra using a moving window-based approach
#'   (window size = `2 * halfWindowSize`). Currently, the
#'   Moving-Average- (`method = "MovingAverage"`),
#'   Weighted-Moving-Average- (`method = "WeightedMovingAverage")`,
#'   weights depending on the distance of the center and calculated
#'   `1/2^(-halfWindowSize:halfWindowSize)`) and
#'   Savitzky-Golay-Smoothing (`method = "SavitzkyGolay"`) are supported.
#'   For details how to choose the correct `halfWindowSize` please see
#'   [`MsCoreUtils::smooth()`].
#'
#'
#' @section Data analysis methods returning the result from the calculation:
#'
#' The functions listed in this section return immediately the result from the
#' calculation. To reduce memory demand (and allow parallel processing) the
#' calculations a chunk-wise processing is generally performed.
#'
#' - `chunkapply()`: apply an arbitrary function to chunks of spectra. See
#'   [chunkapply()] for details and examples.
#'
#' - `containsMz()`: checks for each of the spectra whether they contain mass
#'   peaks with an m/z equal to `mz` (given acceptable difference as defined by
#'   parameters `tolerance` and `ppm` - see [MsCoreUtils::common()] for
#'   details). Parameter
#'   `which` allows to define whether any (`which = "any"`, the default) or
#'   all (`which = "all"`) of the `mz` have to match. The function returns
#'   `NA` if `mz` is of length 0 or is `NA`.
#'
#' - `containsNeutralLoss()`: checks for each spectrum in `object` if it has a
#'   peak with an m/z value equal to its precursor m/z - `neutralLoss` (given
#'   acceptable difference as defined by parameters `tolerance` and `ppm`).
#'   Returns `NA` for MS1 spectra (or spectra without a precursor m/z).
#'
#' - `entropy()`: calculates the entropy of each spectra based on the metrics
#'   suggested by Li et al. (https://doi.org/10.1038/s41592-021-01331-z).
#'   See also [MsCoreUtils::nentropy()] in the *MsCoreUtils* package for
#'   details.
#'
#' - `estimatePrecursorIntensity()`: defines the precursor intensities for MS2
#'   spectra using the intensity of the matching MS1 peak from the
#'   closest MS1 spectrum (i.e. the last MS1 spectrum measured before the
#'   respective MS2 spectrum). With `method = "interpolation"` it is also
#'   possible to calculate the precursor intensity based on an interpolation of
#'   intensity values (and retention times) of the matching MS1 peaks from the
#'   previous and next MS1 spectrum. See [estimatePrecursorIntensity()] for
#'   examples and more details.
#'
#' - `estimatePrecursorMz()`: **for DDA data**: allows to estimate a fragment
#'   spectra's precursor m/z based on the reported precursor m/z and the data
#'   from the previous MS1 spectrum. See [estimatePrecursorMz()] for details.
#'
#' - `neutralLoss()`: calculates neutral loss spectra for fragment spectra. See
#'   [neutralLoss()] for detailed documentation.
#'
#' - `spectrapply()`: applies a given function to each individual spectrum or
#'   sets of a `Spectra` object. By default, the `Spectra` is split into
#'   individual spectra (i.e. `Spectra` of length 1) and the function `FUN`
#'   is applied to each of them. An alternative splitting can be defined with
#'   parameter `f`. Parameters for `FUN` can be passed using `...`.
#'   The returned result and its order depend on the function `FUN` and how
#'   `object` is split (hence on `f`, if provided). Parallel processing is
#'   supported and can be configured with parameter `BPPARAM`, is however only
#'   suggested for computational intense `FUN`.
#'   As an alternative to the (eventual parallel) processing of the full
#'   `Spectra`, `spectrapply()` supports also a chunk-wise processing. For this,
#'   parameter `chunkSize` needs to be specified. `object` is then split into
#'   chunks of size `chunkSize` which are then (stepwise) processed by `FUN`.
#'   This guarantees a lower memory demand (especially for on-disk backends)
#'   since only the data for one chunk needs to be loaded into memory in each
#'   iteration. Note that by specifying `chunkSize`, parameters `f` and
#'   `BPPARAM` will be ignored.
#'   See also `chunkapply()` above or examples below for details on chunk-wise
#'   processing.
#'
#'
#' @section The processing queue:
#'
#' Operations that modify mass peak data, i.e. the m/z and intensity values of
#' a `Spectra` are generally not applied immediately to the data but are
#' *cached* within the object's *processing queue*. These operations are then
#' applied to the data only upon request, for example when m/z and/or
#' intensity values are extracted. This lazy execution guarantees that the
#' same functionality can be applied to any `Spectra` object, regardless of
#' the type of backend that is used. Thus, data manipulation operations can
#' also be applied to data that is *read only*. As a side effect, this enables
#' also to *undo* operations using the `reset()` function.
#'
#' Functions related to the processing queue are:
#'
#' - `applyProcessing()`: for `Spectra` objects that use a **writeable** backend
#'   only: apply all steps from the lazy processing queue to the peak data and
#'   write it back to the data storage. Parameter `f` allows to specify how
#'   `object` should be split for parallel processing. This should either be
#'   equal to the `dataStorage`, or `f = rep(1, length(object))` to disable
#'   parallel processing alltogether. Other partitionings might result in
#'   errors (especially if a `MsBackendHdf5Peaks` backend is used).
#'
#' - `processingLog()`: returns a `character` vector with the processing log
#'   messages.
#'
#' - `reset()`: restores the data to its original state (as much as possible):
#'   removes any processing steps from the lazy processing queue and calls
#'   `reset()` on the backend which, depending on the backend, can also undo
#'   e.g. data filtering operations. Note that a `reset*(` call after
#'   `applyProcessing()` will not have any effect. See examples below for more
#'   information.
#'
#' @param binSize For `bin()`: `numeric(1)` defining the size for the m/z bins.
#'     Defaults to `binSize = 1`.
#'
#' @param BPPARAM Parallel setup configuration. See [BiocParallel::bpparam()]
#'     for more information. This is passed directly to the
#'     [backendInitialize()] method of the [MsBackend-class]. See also
#'     [processingChunkSize()] for additional information on parallel
#'     processing.
#'
#' @param breaks For `bin()`: `numeric` defining the m/z breakpoints between
#'     bins.
#'
#' @param by For `scalePeaks()`: function to calculate a single `numeric` from
#'     intensity values of a spectrum by which all intensities (of
#'     that spectrum) should be divided by. The default `by = sum` will
#'     divide intensities of each spectrum by the sum of intensities of that
#'     spectrum.
#'
#' @param chunkSize For `spectrapply()`: size of the chunks into which the
#'     `Spectra` should be split. This parameter overrides parameters
#'     `f` and `BPPARAM`.
#'
#' @param descending For `pickPeaks()`: `logical`, if `TRUE` just values
#'     betwee the nearest valleys around the peak centroids are used.
#
#' @param f For `spectrapply()` and `applyProcessing()`: `factor` defining
#'     how `object` should be splitted for eventual parallel processing.
#'     Defaults to `factor()` for `spectrapply()` hence the object is not
#'     splitted while it defaults to `f = processingChunkSize(object)` for
#'     `applyProcessing()` splitting thus the object by default into chunks
#'     depending on [processingChunkSize()].
#'
#' @param FUN For `addProcessing()`: function to be applied to the peak matrix
#'     of each spectrum in `object`.
#'     For `bin()`: function to aggregate intensity values of peaks falling
#'     into the same bin. Defaults to `FUN = sum` thus summing up intensities.
#'     For `spectrapply()` and `chunkapply()`: function to be applied to
#'     each individual or each chunk of `Spectra`.
#'
#' @param halfWindowSize For `pickPeaks()`: `integer(1)`, used in the
#'     identification of the mass peaks: a local maximum has to be the
#'     maximum in the window from `(i - halfWindowSize):(i + halfWindowSize)`.
#'     For `smooth()`: `integer(1)`, used in the smoothing algorithm, the
#'     window reaches from `(i - halfWindowSize):(i + halfWindowSize)`.
#'
#' @param k For `pickPeaks()`: `integer(1)`, number of values left and right of
#'     the peak that should be considered in the weighted mean calculation.
#'
#' @param method For `pickPeaks()`: `character(1)`, the noise estimators that
#'     should be used, currently the the *M*edian *A*bsolute *D*eviation
#'     (`method = "MAD"`) and Friedman's Super Smoother
#'     (`method = "SuperSmoother"`) are supported.
#'     For `smooth()`: `character(1)`, the smoothing function that should be
#'     used, currently, the Moving-Average- (`method = "MovingAverage"`),
#'     Weighted-Moving-Average- (`method = "WeightedMovingAverage")`,
#'     Savitzky-Golay-Smoothing (`method = "SavitzkyGolay"`) are supported.
#'
#' @param msLevel. `integer` defining the MS level(s) of the spectra to which
#'     the function should be applied (defaults to all MS levels of `object`.
#'
#' @param mz For `containsMz()`: `numeric` with the m/z value(s) of the mass
#'     peaks to check.
#'
#' @param neutralLoss for `containsNeutralLoss()`: `numeric(1)` defining the
#'     value which should be subtracted from the spectrum's precursor m/z.
#'
#' @param normalized for `entropy()`: `logical(1)` whether the normalized
#'     entropy should be calculated (default). See also
#'     [MsCoreUtils::nentropy()] for details.
#'
#' @param object A `Spectra` object.
#'
#' @param ppm For `containsMz()` and `neutralLoss()`: `numeric(1)` defining a
#'     relative, m/z-dependent, maximal accepted difference between m/z values
#'     for peaks to be matched.
#'
#' @param snr For `pickPeaks()`: `double(1)` defining the
#'     *S*ignal-to-*N*oise-*R*atio. The intensity of a local maximum has to be
#'     higher than `snr * noise` to be considered as peak.
#'
#' @param spectraVariables For `addProcessing()`: `character` with additional
#'     spectra variables that should be passed along to the function defined
#'     with `FUN`. See function description for details.
#'
#' @param threshold For `pickPeaks()`: a `numeric(1)` defining the proportion
#'     of the maximal peak intensity. Only values above the threshold are
#'     used for the weighted mean calculation.
#'     For `replaceIntensitiesBelow()`: a `numeric(1)` defining the threshold
#'     or a `function` to calculate the threshold for each spectrum on its
#'     intensity values. Defaults to `threshold = min`.
#'
#' @param tolerance For `containsMz()` and `neutralLoss()`:
#'     `numeric(1)` allowing to define a constant maximal accepted difference
#'     between m/z values for peaks to be matched.
#'
#' @param value For `replaceIntensitiesBelow()`: `numeric(1)` defining the
#'     value with which intensities should be replaced with.
#'
#' @param which For `containsMz()`: either `"any"` or `"all"` defining whether
#'     any (the default) or all provided `mz` have to be present in the
#'     spectrum.
#'
#' @param x A `Spectra`.
#'
#' @param zero.rm For `bin()`: `logical(1)` indicating whether to remove bins
#'     with zero intensity. Defaults to `TRUE`, meaning the function will
#'     discard bins created with an intensity of 0 to enhance memory
#'     efficiency.
#'
#' @param ... Additional arguments passed to internal and downstream functions.
#'
#' @return
#'
#' See the documentation of the individual functions for a description of the
#' return value.
#'
#' @md
#'
#' @seealso
#'
#' - [compareSpectra()] for calculation of spectra similarity scores.
#'
#' - [processingChunkSize()] for information on parallel and chunk-wise data
#'   processing.
#'
#' - [Spectra] for a general description of the `Spectra` object.
#'
#' @author Sebastian Gibb, Johannes Rainer, Laurent Gatto, Philippine Louail, Nir Shahaf, Mar Garcia-Aloy
#'
#' @examples
#'
#' ## Load a `Spectra` object with LC-MS/MS data.
#' fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
#'     package = "msdata")
#' sps_dda <- Spectra(fl)
#' sps_dda
#'
#'
#' ##  --------  FUNCTIONS RETURNING A SPECTRA  --------
#'
#' ## Replace peak intensities below 40 with a value of 1
#' sps_mod <- replaceIntensitiesBelow(sps_dda, threshold = 20, value = 1)
#' sps_mod
#'
#' ## Get the intensities of the first spectrum before and after the
#' ## operation
#' intensity(sps_dda[1])
#' intensity(sps_mod[1])
#'
#' ## Remove all peaks with an intensity below 5.
#' sps_mod <- filterIntensity(sps_dda, intensity = c(5, Inf))
#'
#' intensity(sps_mod)
#'
#' ## In addition it is possible to pass a function to `filterIntensity()`: in
#' ## the example below we want to keep only peaks that have an intensity which
#' ## is larger than one third of the maximal peak intensity in that spectrum.
#' keep_peaks <- function(x, prop = 3) {
#'     x > max(x, na.rm = TRUE) / prop
#' }
#' sps_mod <- filterIntensity(sps_dda, intensity = keep_peaks)
#' intensity(sps_mod)
#'
#' ## We can also change the proportion by simply passing the `prop` parameter
#' ## to the function. To keep only peaks that have an intensity which is
#' ## larger than half of the maximum intensity:
#' sps_mod <- filterIntensity(sps_dda, intensity = keep_peaks, prop = 2)
#' intensity(sps_mod)
#'
#' ## With the `scalePeaks()` function we can alternatively scale the
#' ## intensities of mass peaks per spectrum to relative intensities. This
#' ## is specifically useful for fragment (MS2) spectra. We below thus
#' ## scale the intensities per spectrum by the total sum of intensities
#' ## (such that the sum of all intensities per spectrum is 1).
#' ## Below we scale the intensities of all MS2 spectra in our data set.
#' sps_mod <- scalePeaks(sps_dda, msLevel = 2L)
#'
#' ## MS1 spectra were not affected
#' sps_mod |>
#'     filterMsLevel(1L) |>
#'     intensity()
#'
#' ## Intensities of MS2 spectra were scaled
#' sps_mod |>
#'     filterMsLevel(2L) |>
#'     intensity()
#'
#' ## Since data manipulation operations are by default not directly applied to
#' ## the data but only cached in the internal processing queue, it is also
#' ## possible to remove these data manipulations with the `reset()` function:
#' tmp <- reset(sps_mod)
#' tmp
#' lengths(sps_dda) |> head()
#' lengths(sps_mod) |> head()
#' lengths(tmp) |> head()
#'
#' ## Data manipulation operations cached in the processing queue can also be
#' ## applied to the mass peaks data with the `applyProcessing()` function, if
#' ## the `Spectra` uses a backend that supports that (i.e. allows replacing
#' ## the mass peaks data). Below we first change the backend to a
#' ## `MsBackendMemory()` and then use the `applyProcessing()` to modify the
#' ## mass peaks data
#' sps_dda <- setBackend(sps_dda, MsBackendMemory())
#' sps_mod <- filterIntensity(sps_dda, intensity = c(5, Inf))
#' sps_mod <- applyProcessing(sps_mod)
#' sps_mod
#'
#' ## While we can't *undo* this filtering operation now using the `reset()`
#' ## function, accessing the data would now be faster, because the operation
#' ## does no longer to be applied to the original data before returning to the
#' ## user.
#'
#'
#' ##  --------  FUNCTIONS RETURNING THE RESULT  --------
#'
#' ## With the `spectrapply()` function it is possible to apply an
#' ## arbitrary function to each spectrum in a Spectra.
#' ## In the example below we calculate the mean intensity for each spectrum
#' ## in a subset of the sciex_im data. Note that we can access all variables
#' ## of each individual spectrum either with the `$` operator or the
#' ## corresponding method.
#' res <- spectrapply(sps_dda[1:20], FUN = function(x) mean(x$intensity[[1]]))
#' head(res)
#'
#' ## As an alternative, applying a function `FUN` to a `Spectra` can be
#' ## performed *chunk-wise*. The advantage of this is, that only the data for
#' ## one chunk at a time needs to be loaded into memory reducing the memory
#' ## demand. This type of processing can be performed by specifying the size
#' ## of the chunks (i.e. number of spectra per chunk) with the `chunkSize`
#' ## parameter
#' spectrapply(sps_dda[1:20], lengths, chunkSize = 5L)
#'
#' ## Precursor intensity estimation. Some manufacturers don't report the
#' ## precursor intensity for MS2 spectra:
#' sps_dda |>
#'     filterMsLevel(2L) |>
#'     precursorIntensity()
#'
#' ## This intensity can however be estimated from the previously measured
#' ## MS1 scan with the `estimatePrecursorIntensity()` function:
#' pi <- estimatePrecursorIntensity(sps_dda)
#'
#' ## This function returned the result as a `numeric` vector with one
#' ## value per spectrum:
#' pi
#'
#' ## We can replace the precursor intensity values of the originating
#' ## object:
#' sps_dda$precursorIntensity <- pi
#' sps_dda |>
#'     filterMsLevel(2L) |>
#'     precursorIntensity()
#'
NULL

#' @exportMethod addProcessing
#'
#' @importFrom ProtGenerics ProcessingStep
#'
#' @importMethodsFrom ProtGenerics addProcessing
#'
#' @importClassesFrom ProtGenerics ProcessingStep
#'
#' @importFrom methods .hasSlot
#'
#' @importFrom BiocGenerics updateObject
#'
#' @rdname addProcessing
setMethod("addProcessing", "Spectra", function(object, FUN, ...,
                                               spectraVariables = character()) {
    if (missing(FUN))
        return(object)
    object@processingQueue <- c(object@processingQueue,
                                list(ProcessingStep(FUN, ARGS = list(...))))
    if (!.hasSlot(object, "processingQueueVariables"))
        object <- updateObject(object)
    object@processingQueueVariables <- union(object@processingQueueVariables,
                                             spectraVariables)
    validObject(object)
    object
})

#' @rdname addProcessing
#'
#' @importMethodsFrom ProtGenerics bin
#'
#' @exportMethod bin
setMethod("bin", "Spectra", function(x, binSize = 1L, breaks = NULL,
                                     msLevel. = uniqueMsLevels(x),
                                     FUN = sum, zero.rm = TRUE) {
    if (!.check_ms_level(x, msLevel.))
        return(x)
    if (!length(breaks)) {
        mzr <- range(.peaksapply(filterMsLevel(x, msLevel.),
                                 function(z, ...) z[c(1L, nrow(z))]
                                 ), na.rm = TRUE)
        breaks <- seq(floor(mzr[1]), ceiling(mzr[2]), by = binSize)
        breaks <- MsCoreUtils:::.fix_breaks(breaks, mzr)
    }
    mids <- (breaks[-length(breaks)] + breaks[-1L]) / 2
    x <- addProcessing(x, .peaks_bin, breaks = breaks, mids = mids,
                       agg_fun = FUN, msLevel = msLevel., zero.rm = zero.rm,
                       spectraVariables = "msLevel")
    x@processing <- .logging(x@processing,
                             "Spectra of MS level(s) ",
                             paste0(msLevel., collapse = ", "),
                             " binned.")
    x
})

#' @rdname addProcessing
#'
#' @exportMethod containsMz
setMethod("containsMz", "Spectra", function(object, mz = numeric(),
                                            tolerance = 0,
                                            ppm = 20, which = c("any", "all"),
                                            BPPARAM = bpparam()) {
    if (length(object)) {
        cond_fun <- match.fun(match.arg(which))
        if (all(is.na(mz)))
            return(rep(NA, length(object)))
        mz <- unique(sort(mz))
        BPPARAM <- backendBpparam(object@backend, BPPARAM)
        unlist(.peaksapply(
            object, FUN = .peaks_contain_mz, mz = mz, tolerance = tolerance,
            ppm = ppm, condFun = cond_fun, BPPARAM = BPPARAM),
            use.names = FALSE
            )
    } else logical()
})

#' @rdname addProcessing
#'
#' @exportMethod containsNeutralLoss
setMethod("containsNeutralLoss", "Spectra", function(object, neutralLoss = 0,
                                                     tolerance = 0, ppm = 20,
                                                     BPPARAM = bpparam()) {
    BPPARAM <- backendBpparam(object@backend, BPPARAM)
    ## TODO: FIX me to use chunk size.
    if (is(BPPARAM, "SerialParam")) {
        .has_mz_each(object, precursorMz(object) - neutralLoss,
                     tolerance = tolerance, ppm = ppm, parallel = BPPARAM)
    } else {
        sp <- SerialParam()
        f <- as.factor(dataStorage(object))
        res <- .lapply(object, FUN = function(obj, n, tol, ppm, par) {
            .has_mz_each(obj, precursorMz(obj) - n, tolerance = tol,
                         ppm = ppm, parallel = sp)
        }, n = neutralLoss, tol = tolerance, ppm = ppm, par = sp, f = f,
                       BPPARAM = BPPARAM)
        unsplit(res, f = f)
    }
})

#' @rdname addProcessing
#'
#' @importFrom MsCoreUtils entropy nentropy
#'
#' @export
setMethod("entropy", "Spectra", function(object, normalized = TRUE) {
    if (length(object)) {
        if (normalized) entropy_fun <- nentropy
        else entropy_fun <- entropy
        unlist(.peaksapply(
            object, FUN = function(pks, ...) entropy_fun(pks[, "intensity"])),
            use.names = FALSE
            )
    } else numeric()
})
#' @rdname addProcessing
setMethod("entropy", "ANY", function(object, ...) {
    MsCoreUtils::entropy(object)
})

#' @rdname addProcessing
#'
#' @exportMethod pickPeaks
setMethod("pickPeaks", "Spectra",
          function(object, halfWindowSize = 2L,
                   method = c("MAD", "SuperSmoother"), snr = 0, k = 0L,
                   descending = FALSE, threshold = 0,
                   msLevel. = uniqueMsLevels(object), ...) {
    if (!.check_ms_level(object, msLevel.))
        return(object)
    if (!is.integer(halfWindowSize) || length(halfWindowSize) != 1L ||
        halfWindowSize <= 0L)
        stop("Argument 'halfWindowSize' has to be an integer of length 1 ",
             "and > 0.")
    if (!is.numeric(snr) || length(snr) != 1L || snr < 0L)
        stop("Argument 'snr' has to be a numeric of length 1 that is >= 0.")
    if (!is.integer(k) || length(k) != 1L || k < 0L)
        stop("Argument 'k' has to be an integer of length 1 that is >= 0.")
    if (!is.logical(descending) || length(descending) != 1L ||
        is.na(descending))
        stop("Argument 'descending' has to be just TRUE or FALSE")
    if (!is.numeric(threshold) || length(threshold) != 1L ||
        threshold < 0L || threshold > 1L)
        stop("Argument 'threshold' has to be a numeric of length 1 ",
             "that is >= 0 and <= 1.")

    method <- match.arg(method)

    object <- addProcessing(object, .peaks_pick,
                            halfWindowSize = halfWindowSize, method = method,
                            snr = snr, k = k, descending = descending,
                            threshold = threshold, msLevel = msLevel., ...,
                            spectraVariables = c("msLevel", "centroided"))
    object$centroided[msLevel(object) %in% msLevel.] <- TRUE
    object@processing <- .logging(object@processing,
                                  "Peak picking with ", method,
                                  " noise estimation, hws = ", halfWindowSize,
                                  ", snr = ", snr,
                                  if (k > 0) " and centroid refinement")
    object
})

#' @rdname addProcessing
#'
#' @exportMethod replaceIntensitiesBelow
setMethod("replaceIntensitiesBelow", "Spectra",
          function(object, threshold = min, value = 0,
                   msLevel. = uniqueMsLevels(object)) {
              if (!is.numeric(threshold) && !is.function(threshold))
                  stop("Argument 'threshold' has to be either numeric or ",
                       "a function.")
              if (!.check_ms_level(object, msLevel.))
                  return(object)
              object <- addProcessing(
                  object, .peaks_replace_intensity, threshold = threshold,
                  value = value, msLevel = msLevel.,
                  spectraVariables = c("msLevel", "centroided"))
              msg <- ifelse(
                  is.function(threshold),
                  yes = "a threshold defined by a provided function",
                  no = threshold)
              object@processing <- .logging(object@processing,
                                            "Signal <= ", msg,
                                            " in MS level(s) ",
                                            paste0(msLevel., collapse = ", "),
                                            " set to 0")
              object
          })

#' @rdname addProcessing
setMethod("reset", "Spectra", function(object, ...) {
    object@backend <- reset(object@backend)
    object@processingQueue <- list()
    if (!.hasSlot(object, "processingQueueVariables"))
        object <- updateObject(object, check = FALSE)
    object@processingQueueVariables <- character()
    object@processing <- .logging(object@processing, "Reset object.")
    object
})

#' @rdname addProcessing
#'
#' @importFrom ProtGenerics smooth
#' @importFrom MsCoreUtils coefMA coefWMA coefSG
#' @exportMethod smooth
setMethod("smooth", "Spectra",
          function(x, halfWindowSize = 2L,
                   method = c("MovingAverage", "WeightedMovingAverage",
                              "SavitzkyGolay"),
                   msLevel. = uniqueMsLevels(x), ...) {
    if (!.check_ms_level(x, msLevel.))
        return(x)
    if (!is.integer(halfWindowSize) || length(halfWindowSize) != 1L ||
        halfWindowSize <= 0L)
        stop("Argument 'halfWindowSize' has to be an integer of length 1 ",
             "and > 0.")

    method <- match.arg(method)
    coef <- switch(method,
                   MovingAverage = coefMA(halfWindowSize),
                   WeightedMovingAverage = coefWMA(halfWindowSize),
                   SavitzkyGolay = coefSG(halfWindowSize, ...))

    x <- addProcessing(x, .peaks_smooth, halfWindowSize = halfWindowSize,
                       coef = coef, msLevel = msLevel., ...,
                       spectraVariables = "msLevel")
    x@processing <- .logging(x@processing, "Spectra smoothing with ", method,
                                           ", hws = ", halfWindowSize)
    x
})

#' @rdname addProcessing
#'
#' @importMethodsFrom ProtGenerics spectrapply
#'
#' @exportMethod spectrapply
setMethod("spectrapply", "Spectra", function(object, FUN, ...,
                                             chunkSize = integer(),
                                             f = factor(),
                                             BPPARAM = SerialParam()) {
    if (missing(FUN))
        FUN <- identity
    if (length(chunkSize))
        return(chunkapply(object, FUN, ..., chunkSize = chunkSize))
    if (!length(f))
        f <- as.factor(seq_along(object))
    .lapply(object, FUN = FUN, f = f, ...,
            BPPARAM = backendBpparam(object@backend, BPPARAM))
})

#' @title Estimate Precursor Intensities
#'
#' @aliases estimatePrecursorIntensity
#'
#' @description
#'
#' Some MS instrument manufacturers don't provide precursor intensities for
#' fragment spectra. These can however be estimated, given that also MS1
#' spectra are available. The `estimatePrecursorIntensity()` funtion defines the
#' precursor intensities for MS2 spectra using the intensity of the matching
#' MS1 peak from the closest MS1 spectrum (i.e. the last MS1 spectrum measured
#' before the respective MS2 spectrum). With `method = "interpolation"` it is
#' also possible to calculate the precursor intensity based on an interpolation
#' of intensity values (and retention times) of the matching MS1 peaks from the
#' previous and next MS1 spectrum. See below for an example.
#'
#' @param object `Spectra` with MS1 and MS2 spectra.
#'
#' @param ppm `numeric(1)` with the maximal allowed relative difference of m/z
#'     values between the precursor m/z of a spectrum and the m/z of the
#'     respective ion on the MS1 scan.
#'
#' @param tolerance `numeric(1)` with the maximal allowed difference of m/z
#'     values between the precursor m/z of a spectrum and the m/z of the
#'     respective ion on the MS1 scan.
#'
#' @param method `character(1)` defining whether the precursor intensity
#'     should be estimated on the previous MS1 spectrum (`method = "previous"`,
#'     the default) or based on an interpolation on the previous and next
#'     MS1 spectrum (`method = "interpolation"`).
#'
#' @param msLevel. `integer(1)` the MS level for which precursor intensities
#'     should be estimated. Defaults to `2L`.
#'
#' @param f `factor` (or vector to be coerced to `factor`) defining which
#'     spectra belong to the same original data file (sample).
#'     Defaults to `f = dataOrigin(x)`.
#'
#' @param BPPARAM Parallel setup configuration. See [BiocParallel::bpparam()]
#'     for more information. This is passed directly to the
#'     [backendInitialize()] method of the [MsBackend-class].
#'
#' @author Johannes Rainer with feedback and suggestions from Corey Broeckling
#'
#' @importMethodsFrom ProtGenerics estimatePrecursorIntensity
#'
#' @exportMethod estimatePrecursorIntensity
#'
#' @rdname estimatePrecursorIntensity
#'
#' @examples
#'
#' #' ## Calculating the precursor intensity for MS2 spectra:
#' ##
#' ## Some MS instrument manufacturer don't report the precursor intensities
#' ## for MS2 spectra. The `estimatePrecursorIntensity` function can be used
#' ## in these cases to calculate the precursor intensity on MS1 data. Below
#' ## we load an mzML file from a vendor providing precursor intensities and
#' ## compare the estimated and reported precursor intensities.
#' tmt <- Spectra(msdata::proteomics(full.names = TRUE)[5],
#'     backend = MsBackendMzR())
#' pmi <- estimatePrecursorIntensity(tmt)
#' plot(pmi, precursorIntensity(tmt))
#'
#' ## We can also replace the original precursor intensity values with the
#' ## newly calculated ones
#' tmt$precursorIntensity <- pmi
setMethod(
    "estimatePrecursorIntensity", "Spectra",
    function(object, ppm = 20, tolerance = 0,
             method = c("previous", "interpolation"),
             msLevel. = 2L, f = dataOrigin(object), BPPARAM = bpparam()) {
        if (is.factor(f))
            f <- as.character(f)
        f <- factor(f, levels = unique(f))
        BPPARAM <- backendBpparam(object@backend, BPPARAM)
        unlist(bplapply(split(object, f),
                        FUN = .estimate_precursor_intensity, ppm = ppm,
                        tolerance = tolerance, method = method,
                        msLevel = msLevel., BPPARAM = BPPARAM),
               use.names = FALSE)
    })


################################################################################
##
## Spectra similarity calculations
##
################################################################################

#' @title Spectra similarity calculations
#'
#' @name compareSpectra
#'
#' @aliases compareSpectra
#'
#' @description
#'
#' `compareSpectra()` compares each spectrum in `x` with each spectrum in `y`
#' using the function provided with `FUN` (defaults to
#' [MsCoreUtils::ndotproduct()]). If `y` is missing, each spectrum in `x` is
#' compared with each other spectrum in `x`.
#' The matching/mapping of peaks between the compared spectra is done with the
#' `MAPFUN` function. The default [joinPeaks()] matches peaks of both spectra
#' and allows to keep all peaks from the first spectrum (`type = "left"`),
#' from the second (`type = "right"`), from both (`type = "outer"`) and to
#' keep only matching peaks (`type = "inner"`); see [joinPeaks()] for more
#' information and examples). The `MAPFUN` function should have parameters
#' `x`, `y`, `xPrecursorMz` and `yPrecursorMz` as these values are passed to
#' the function.
#'
#' In addition to `joinPeaks()` also [joinPeaksGnps()] is supported for
#' GNPS-like similarity score calculations. Note that `joinPeaksGnps()` should
#' only be used in combination with `FUN = MsCoreUtils::gnps`
#' (see [joinPeaksGnps()] for more information and details). Use
#' `MAPFUN = joinPeaksNone` to disable internal peak matching/mapping if a
#' similarity scoring function is used that performs the matching internally.
#'
#' `FUN` is supposed to be a function to compare intensities of (matched)
#' peaks of the two spectra that are compared. The function needs to take two
#' matrices with columns `"mz"` and `"intensity"` as input and is supposed
#' to return a single numeric as result. In addition to the two peak matrices
#' the spectra's precursor m/z values are passed to the function as parameters
#' `xPrecursorMz` (precursor m/z of the `x` peak matrix) and `yPrecursorMz`
#' (precursor m/z of the `y` peak matrix). Additional parameters to functions
#' `FUN` and `MAPFUN` can be passed with `...`. Parameters `ppm` and
#' `tolerance` are passed to both `MAPFUN` and `FUN`.
#' The function returns a `matrix` with the results of `FUN` for each
#' comparison, number of rows equal to `length(x)` and number of columns
#' equal `length(y)` (i.e. element in row 2 and column 3 is the result from
#' the comparison of `x[2]` with `y[3]`). If `SIMPLIFY = TRUE` the `matrix`
#' is *simplified* to a `numeric` if length of `x` or `y` is one. See also
#' the vignette for additional examples, such as using spectral entropy
#' similarity in the scoring.
#'
#' @param FUN function to compare intensities of peaks between two spectra.
#'     Defaults to [MsCoreUtils::ndotproduct()].
#'
#' @param MAPFUN For `compareSpectra()`: function to map/match peaks between
#'     the two compared spectra. See [joinPeaks()] for more information and
#'     possible functions. Defaults to [joinPeaks()].
#'
#' @param ppm `numeric(1)` defining a relative, m/z-dependent, maximal
#'     accepted difference between m/z values for peaks to be matched. This
#'     parameter is directly passed to `MAPFUN`.
#'
#' @param tolerance `numeric(1)` allowing to define a constant maximal
#'     accepted difference between m/z values for peaks to be matched. This
#'     parameter is directly passed to `MAPFUN`.
#'
#' @param x A `Spectra` object.
#'
#' @param y A `Spectra` object.
#'
#' @param SIMPLIFY `logical(1)` defining  whether the result matrix should be
#'     *simplified* to a `numeric` if possible (i.e. if either `x` or `y` is
#'     of length 1).
#'
#' @param ... Additional arguments passed to the internal functions.
#'
#' @importFrom MsCoreUtils ndotproduct
#'
#' @importMethodsFrom ProtGenerics compareSpectra
#'
#' @exportMethod compareSpectra
#'
#' @author Sebastian Gibb, Johannes Rainer, Laurent Gatto
#'
#' @examples
#'
#' ## Load a `Spectra` object with LC-MS/MS data.
#' fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
#'     package = "msdata")
#' sps_dda <- Spectra(fl)
#' sps_dda
#'
#' ## Restrict to MS2 (fragment) spectra:
#' sps_ms2 <- filterMsLevel(sps_dda, msLevel = 2L)
#'
#' ## Compare spectra: comparing spectra 2 and 3 against spectra 10:20 using
#' ## the normalized dotproduct method.
#' res <- compareSpectra(sps_ms2[2:3], sps_ms2[10:20])
#' ## first row contains comparisons of spectrum 2 with spectra 10 to 20 and
#' ## the second row comparisons of spectrum 3 with spectra 10 to 20
#' res
#'
#' ## We next calculate the pairwise similarity for the first 10 spectra
#' compareSpectra(sps_ms2[1:10])
#'
#' ## Use compareSpectra to determine the number of common (matching) peaks
#' ## with a ppm of 10:
#' ## type = "inner" uses a *inner join* to match peaks, i.e. keeps only
#' ## peaks that can be mapped betwen both spectra. The provided FUN returns
#' ## simply the number of matching peaks.
#' compareSpectra(sps_ms2[2:3], sps_ms2[10:20], ppm = 10, type = "inner",
#'     FUN = function(x, y, ...) nrow(x))
#'
#' ## We repeat this calculation between all pairwise combinations
#' ## of the first 20 spectra
#' compareSpectra(sps_ms2[1:20], ppm = 10, type = "inner",
#'     FUN = function(x, y, ...) nrow(x))
NULL

#' @rdname compareSpectra
setMethod("compareSpectra", signature(x = "Spectra", y = "Spectra"),
          function(x, y, MAPFUN = joinPeaks, tolerance = 0, ppm = 20,
                   FUN = ndotproduct, ..., SIMPLIFY = TRUE) {
              mat <- .compare_spectra_chunk(x, y, MAPFUN = MAPFUN,
                                            tolerance = tolerance,
                                            ppm = ppm, FUN = FUN, ...)
              if (SIMPLIFY && (length(x) == 1 || length(y) == 1))
                  mat <- as.vector(mat)
              mat
          })
#' @rdname compareSpectra
setMethod("compareSpectra", signature(x = "Spectra", y = "missing"),
          function(x, y = NULL, MAPFUN = joinPeaks, tolerance = 0, ppm = 20,
                   FUN = ndotproduct, ..., SIMPLIFY = TRUE) {
              if (length(x) == 1)
                  return(compareSpectra(x, x, MAPFUN = MAPFUN,
                                        tolerance = tolerance,
                                        ppm = ppm, FUN = FUN, ...,
                                        SIMPLIFY = SIMPLIFY))
              mat <- .compare_spectra_self(x, MAPFUN = MAPFUN, FUN = FUN,
                                           tolerance = tolerance, ppm = ppm,
                                           ...)
              if (SIMPLIFY && length(x) == 1)
                  mat <- as.vector(mat)
              mat
          })


################################################################################
##
## methods with documentation in Spectra-functions.R
##
################################################################################

#' @rdname processingChunkSize
setMethod("backendBpparam", "Spectra", function(object, BPPARAM = bpparam()) {
    backendBpparam(object@backend, BPPARAM)
})
rformassspectrometry/Spectra documentation built on Dec. 19, 2024, 1:05 p.m.