The aim of the Python Metabolites2Network package is to provide methods and tools for a flexible way of matching metabolites identified using metabolomics or lipidomics to networks.
This flexible matching of identifiers is based on ontology. In particular, we use ChEBI ontology to allow matching of precisely identified molecules (e.g. lipid species) to more generic descriptions of molecules that can be found in metabolic networks (e.g. lipid classes).
The package works for metabolomics data but is particularly well suited for lipids.
The program takes as input the metabolites/lipids annotated and the metabolites belonging to a network. Both have to use ChEBI as identifiers. Then the program returns closest matches using the ChEBI ontology Directed Acyclic Graph.
Metabolites2Network code is provided by INRA MetExplore group and available in MetExplore web server. This project was developed within MetaboHub: France metabolomics and fluxomics infrastructure and TOXALIM laboratory.
These examples are computed on Recon 2.2 metabolic network (BioSource 4311 in MetExplore, coming from Swainston et al. 2016 )
| ChEBI id in Dataset | Dataset molecule name | ChEBI id in Recon2.2 | matching molecules in Recon2.2 | ontology connection | mapping distance | | ------------- |:-------------:| -----:|:-------------:|:-------------:|:-------------:| | CHEBI:15756 | hexadecanoic acid | CHEBI:7896 | Palmitate (hexadecanoate), M_hdca | hexadecanoic acid (CHEBI:15756) is conjugate acid of hexadecanoate (CHEBI:7896) | -0.1 | | CHEBI:90488 | phosphatidylinositol (18:1/20:4) | CHEBI:57880 | 1-phosphatidyl-1D-myo-inositol(1−), M_pail_hs | phosphatidylinositol (18:1/20:4) (CHEBI:90488) is a 1-phosphatidyl-1D-myo-inositol (CHEBI:16749) 1-phosphatidyl-1D-myo-inositol (CHEBI:16749) is conjugate acid of 1-phosphatidyl-1D-myo-inositol(1−) (CHEBI:57880) | 1.1 | | CHEBI:36023 | vaccenic acid | CHEBI:30828 | trans-vaccenate, M_vacc | trans-vaccenate(1−) (CHEBI:30828) is conjugate base of trans-vaccenic acid (CHEBI:28727) trans-vaccenic acid (CHEBI:28727) is a vaccenic acid (CHEBI:36023)| -1.1 |
Here are some explanations of these results:
pip install -r requirements.txt -U
Command line works as follows:
python3 ./metabolomics2network.py file_type metabolomics_path network_metabolites_path output_path json_conf_file_path
file_type
First parameter specify the type of files used as input for metabolomics data and for output file.
Values has to be json
or tsv
.
metabolomics_path: Path to the metabolomics (including lipidomics) dataset
This file contains the metabolites that will be matched to the network. Here is a small example:
[{"name":"M1","undefined":"","chebi":"17408"}]
network_metabolites_path: Path to the network metabolites json file
The file contains all the metabolites of a given metabolic network formatted in json. Here is a small example:
[{"id":"7230990","name":"6-hydroxypaclitaxel","dbIdentifier":"M_htaxol_b","chemicalFormula":"C47H51NO15","idin":[{"extDBName":"chebi","extID":"CHEBI:63859","origin":"SBML File","score":"1"},{"extDBName":"inchi","extID":"InChI=1S\/C47H51NO15\/c1-24-30(61-43(57)33(51)32(27-16-10-7-11-17-27)48-41(55)28-18-12-8-13-19-28)22-47(58)40(62-42(56)29-20-14-9-15-21-29)36-45(6,38(54)35(60-25(2)49)31(24)44(47,4)5)37(53)34(52)39-46(36,23-59-39)63-26(3)50\/h7-21,30,32-37,39-40,51-53,58H,22-23H2,1-6H3,(H,48,55)\/t30-,32-,33+,34-,35+,36-,37-,39+,40-,45-,46+,47+\/m0\/s1","origin":"SBML File","score":"1"}]},
{"id":"7230991","name":"6-hydroxypaclitaxel","dbIdentifier":"M_htaxol_c","chemicalFormula":"C47H51NO15","idin":[{"extDBName":"chebi","extID":"CHEBI:63859","origin":"SBML File","score":"1"},{"extDBName":"inchi","extID":"InChI=1S\/C47H51NO15\/c1-24-30(61-43(57)33(51)32(27-16-10-7-11-17-27)48-41(55)28-18-12-8-13-19-28)22-47(58)40(62-42(56)29-20-14-9-15-21-29)36-45(6,38(54)35(60-25(2)49)31(24)44(47,4)5)37(53)34(52)39-46(36,23-59-39)63-26(3)50\/h7-21,30,32-37,39-40,51-53,58H,22-23H2,1-6H3,(H,48,55)\/t30-,32-,33+,34-,35+,36-,37-,39+,40-,45-,46+,47+\/m0\/s1","origin":"SBML File","score":"1"}]},]
The file provided as an example in the repository was produced using MetExplore web server (wwww.metexplore.fr). This file corresponds to the metabolites of Recon 2.2 metabolic network (BioSource 4311 in MetExplore, coming from Swainston et al. 2016 )
If you need a specific file please contact: contact-metexplore@inra.fr `` output_path: Path where output files will be writen
Depending on the file_type
option, the file will be returned in txt (tsv) or json
conf_file_path: Path to the file containing aliases to ensure correspondance in identifiers labels. Here is an example:
name alias: name
chebi alias: chebi
formula alias: formula
kegg alias: kegg
pubchem alias: pubchem
hmdb alias: hmdb
inchi alias: inchi
smiles alias: smiles
lipidmaps alias: lipidmap
swisslipids alias: swisslipids
mapping_types: - 1 = exact multimapping - 2 = chebi class mapping
Remarks: - To use multiple mapping at once use commas, for example 1,2 for exact mapping and ChEBI mapping - Output consider results in this order of priority: Exact multimapping > ChEBI class mapping
python3 ./metabolomics2network.py json ./example_data/data_15756.json ./example_data/metabolites_4311_DB.json ./example_data/data_out.json ./conf.txt 1,2
Use the json
parameter.
This is a basic example file for data:
[{"name":"M1","undefined":"","chebi":"15756"}]
Here is an example of output:
[
{
"name": "M1",
"undefined": "",
"chebi": "15756",
"mapped": [
{
"path": [
"15756"
],
"datasetname": "M1",
"idsql": [
"7236116"
],
"networkid": "M_hdca_x",
"mapping_type": "chebi class mapping",
"distance": "-0.1",
"identifiers": {
"chebi": [
"7896",
"7896"
]
}
},
{
"path": [
"15756"
],
"datasetname": "M1",
"idsql": [
"7236408"
],
"networkid": "M_hdca_e",
"mapping_type": "chebi class mapping",
"distance": "-0.1",
"identifiers": {
"chebi": [
"7896",
"7896"
]
}
},
{
"path": [
"15756"
],
"datasetname": "M1",
"idsql": [
"7236411"
],
"networkid": "M_hdca_c",
"mapping_type": "chebi class mapping",
"distance": "-0.1",
"identifiers": {
"chebi": [
"7896",
"7896"
]
}
},
{
"path": [
"15756"
],
"datasetname": "M1",
"idsql": [
"7236416"
],
"networkid": "M_hdca_l",
"mapping_type": "chebi class mapping",
"distance": "-0.1",
"identifiers": {
"chebi": [
"7896",
"7896"
]
}
},
{
"path": [
"15756"
],
"datasetname": "M1",
"idsql": [
"7236417"
],
"networkid": "M_hdca_r",
"mapping_type": "chebi class mapping",
"distance": "-0.1",
"identifiers": {
"chebi": [
"7896",
"7896"
]
}
},
{
"path": [
"15756"
],
"datasetname": "M1",
"idsql": [
"7236435"
],
"networkid": "M_hdca_b",
"mapping_type": "chebi class mapping",
"distance": "-0.1",
"identifiers": {
"chebi": [
"7896",
"7896"
]
}
}
]
}
]
open the conf_tsv.txt file in the main directory to set the names of the columns in the metabolites file and the lipid file There is no need for every column to be present in your files In case some metabolites can have multiple occurences of the same type of identifiers, you can also set the separator character (by default "|")
run the mapping script using command line:
python3 ./metabolomics2network.py tsv ./example_data/data_15756.txt ./example_data/metabolites_4311_DB.json ./example_data/data_out.txt ./conf.txt 1,2
The tsv
option has to be set.
Data has to be formatted following order specified in the conf.txt file and has to start with an header line. Here is an example:
name chebi
hexadecanoic acid 15756
Out file is returned in tsv format. For example:
metabolite name mapped on id mapping types distance chebi inchi smiles hmdb kegg pubchem lipidmaps swisslipids
hexadecanoic acid M_hdca_x;M_hdca_e;M_hdca_c;M_hdca_l;M_hdca_r;M_hdca_b; chebi class mapping;chebi class mapping;chebi class mapping;chebi class mapping;chebi class mapping;chebi class mapping; -0.1;-0.1;-0.1;-0.1;-0.1;-0.1; ['7896', '7896'];['7896', '7896'];['7896', '7896'];['7896', '7896'];['7896', '7896'];['7896', '7896']; N/A;N/A;N/A;N/A;N/A;N/A; N/A;N/A;N/A;N/A;N/A;N/A; N/A;N/A;N/A;N/A;N/A;N/A; N/A;N/A;N/A;N/A;N/A;N/A; N/A;N/A;N/A;N/A;N/A;N/A; N/A;N/A;N/A;N/A;N/A;N/A; N/A;N/A;N/A;N/A;N/A;N/A;
python3 ./tests/test_integration_mapping_json.py
v1.0.2 updated the package to fit release 1.0.9 of libChEBIpy
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
We use SemVer for versioning. For the versions available, see the tags on this repository.
This project is licensed under the MIT License - see the LICENSE.md file for details
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.