This documents describes the general structure of the package and provides helpful references to code and files for contributors. Preferably read the full document.
What is this package good for?
The Spectra package (and
the Spectra
class) provides a powerful infrastructure for mass spectrometry
(MS) data in R (possibly see the
SpectraTutorials for more
information, in particular the
Spectra-backends
vignette for a description of the data structure).
Powerful MS data algorithms algorithms are also available in Python, e.g. provided by the matchms library.
Why re-implement what's already available?
This package translates an R Spectra
object into the matchms Python
Spectrum
data structure and allows you to call functions of the matchms
package and translate the results back into R data objects.
Where to find what?
The R folder contains all R source files.
R/conversion.R contains functions to convert between R and Python data
structures (e.g. between Spectra::Spectra
and matchms.Spectrum
). The
conversion of the Python result into an R data type is handled by R's
reticulate package, which can convert all basic data types between R and
Python.
R/compareSpectriPy.R contains the mass spectral similarity calculation
functions. The core function is the internal
.compare_spectra_python()
function that manages the Anaconda environment, translates the data to
Python data structures and calls the Python command using
py_run_string()
. The Python command itself is generated by the
python_command()
(e.g. this)
command called on the parameter object
CosineGreedyParam
. To
use a new similarity calculation function or a new Python
functionality/algorithm, ideally a new param object is implemented with
the python_command()
method, which returns the python command that is
specific to the new algorithm/Python functionality to run in Python.
R/basilisk.R cointains the Python environment definition and required/used Python libraries (see below for more information).
The tests folder contains all unit tests. A general testthat.R file that configures and sets up the tests and a unit test file for each R source file (named test_.R) within the testthat folder.
The vignettes folder contains an R markdown document that explains the use of the SpectriPy package using examples. This is a good starting point to explore the package and its functionality.
Where are python libraries defined?
SpectriPy uses the R reticulate package for conversion between (basic) R and Python data types.and relies on Bioconductor's basilisk package to setup and manage the Python envrionment.
The Python environment and required libraries are defined in the R/basilisk.R file. Different environments can be defined in that file with the required libraries (including versions).
To execute Python code from a certain library, the basiliscRun()
function is
used, with the respective environment providing this library being enabled
and disabled with the basiliskStart()
and basiliskStop()
functions.
The reticulate r_to_py()
and py_to_r()
functions are used for conversion
of basic data types between R and Python and vice versa. To use these
functions, an Python environment with the matchms library must be used (or
the one defined in SpectriPy and managed by basilisk needs to be activated
first using cl <- basiliskStart(SpectriPy:::matchms_env)
(see package
vignette for an example).
What data could be used in tests?
The package does not contain any test data files. Test and example data are created manually by defining m/z and intensity values of MS peaks. Data files could be added (e.g. in MGF format) if needed and put into a inst/extdata folder.
Alternatively, example files in mzML format would be available in Bioconductor's msdata package.
To test the package and newly created functionality: add the respective unit
tests to the tests/testthat folder and evaluate them e.g. by running
rcmdcheck::rcmdcheck(args = "--no-manual")
in an R session started within
the package folder.
What could be implemented?
Add some new similarity calculation functionality to SpectriPy
. See also
issue #19.
Integrate other Python libraries? More a discussion - see issue #24.
Integrate functionality for spectra processing, downstream analysis (e.g. cleaning), ... See also issue #20.
Ability to translate additional data structures. See also issue #18.
More efficient translation of data structures. Better handling of metadata. See also issue #17.
Improve documentation. See also issue #25.
Define a use case analysis (or ideally several): show how data can be analyzed with the SpectriPy package and contrast that with a "quarto" or "Jupyter Notebook" document directly combining the R and Python code: is there really need for additional convenience functionality within an R package, or can the same, or more, be achieved with e.g. "quarto"? What are the benefits of bundling/wrapping Python functionality into R functions? See also issue #21.
Add more use cases and examples to the package vignette
(vignettes/SpectriPy.Rmd) file. See also issue #26.
How to contribute?
Ideally fork the github repository, implement extensions and make a pull request to the main branch.
Follow the coding style guidelines and adhere to the code of conduct.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.