DeepMolecules

Help

Input types for enzymes/proteins and metabolites

Enzymes

The input for an enzyme must be a string containing the enzyme's amino acid sequence. The model currently does not handle missing information, such as the '*' character, and including it will prevent the model from proceeding.

Metabolites

There are three valid input types for the metabolite: SMILES string, KEGG Compound ID, and InChI string

InChI string

InChI strings are textual representations of chemical structures. Every InChI string is a unique identifier and contains detailed information about the structure of a small molecule. For more details on InChI, see this page from IUPAC.

KEGG Compound ID

The KEGG Compound database contains identifiers for many small molecules and drugs. A KEGG Compound ID starts with a "C" or "D" followed by a five-digit number. For more information see the KEGG homepage.

SMILES

Simplified Molecular Input Line Entry Specification (SMILES) allows to represent the structure of a molecule using ASCII strings. You can get the SMILES for a molecule e.g. by searching for the molecules name in PubChem. Since SMILES representations are not unique for all molecules, we recommend to use InChI string or KEGG Compound IDs instead, if possible.

Single Input File

Case study: ESP (Enzyme-Substrate Pair Prediction) step by step

_cat

TurNuP (k_cat prediction)

_cat

case study

_cat

K_M prediction

case study

_cat

SPOT (Transporter - Substrate Pair prediction)

case study

Multiple Input file

Switch from CLS to XLSX format

Starting from December 9, 2024, we have switched from using CLS files to XLSX files specifically for k_cat predictions. XLSX files can be created using spreadsheet programs like Microsoft Excel or Google Sheets, or through pandas library in python. For more details on how to create an XLSX file, you can refer to Microsoft Excel or Google Sheets.

How should your file look like?

Your file format depends on the model you are using. Attention: InChI strings can contain commas (","), so be sure to properly structure your data in the required format.
You can download a sample file for each model below.

Example of multiple inputs with a file. The enzyme-substrate pairs and metabolites displayed here are not real.

Enzyme-Substrate Pair Prediction:

Your file must be in XLSX format and contain exactly two columns, one called "Protein" and one called "Metabolite". Each row should contain one enzyme and one metabolite in the format described above. The upper limit of accepted enzyme-metabolite pairs is 500. You can download a sample file here. We have shown that the prediction performance of our model is low when it is applied to metabolites which were not present in our training set. Therefore, we check for every uploaded metabolite if it was part of our training set. We return this information in the column "metabolite".

k_cat prediction:

Your file must be in XLSX format and contain three columns, titled "Enzyme", "Substrates", and "Products". Each row should contain one enzyme-reaction pair in the format described above. For both columns Substrates and Products, metabolites should be separated by a semicolon ";". The upper limit of accepted enzyme-reaction pairs is 500. You can download a sample XLSX file here.

K_M prediction:

Your file must be in XLSX format and contain exactly two columns, one called "Enzyme" and one called "Substrate". Each row should contain one enzyme and one metabolite in the described format. The upper limit of accepted enzyme-metabolite pairs is 500. You can download a sample file here.

SPOT:

Your file must be in XLSX format and contain exactly two columns, one called "Protein" and one called "Metabolite". Each row should contain one transporter and one molecule in the format described above. The upper limit of accepted transporter-molecule pairs is 1000. You can download a sample file here.

Impressum

Data Privacy

DeepMolecules@hhu.de

Help

Input types for enzymes/proteins and metabolites

Enzymes

Metabolites

InChI string

KEGG Compound ID

SMILES

Single Input File

Case study: ESP (Enzyme-Substrate Pair Prediction) step by step

TurNuP (kcat prediction)

KM prediction

SPOT (Transporter - Substrate Pair prediction)

Multiple Input file

Switch from CLS to XLSX format

How should your file look like?

TurNuP (k_cat prediction)

K_M prediction