Data Management

Loading ChEBI Ontology Data

ChEBai accesses the ChEBI ontology data from the following URL: http://purl.obolibrary.org/obo/chebi/{version}/chebi.obo.

You can find more information on the ChEBI ontology here: https://www.ebi.ac.uk/chebi

ChEBI versions

Change the chebi version used for all sets (default: 200):

--data.init_args.chebi_version=VERSION

To change only the version of the train and validation sets independently of the test set, use

--data.init_args.chebi_version_train=VERSION

Data Preprocessing

Upon loading the ontology data, ChEBai undergoes preprocessing, including hierarchy extraction and division into train, validation, and test sets. During preprocessing, a filter is applied to consider only chemical entities with a minimum number of subclasses (e.g., 50 or 100) annotated with SMILES (Simplified Molecular Input Line Entry System) strings.

Data folder structure

Data is organized within the following directory structure:

Contains the raw chebi data (in .obo format) which is downloaded from respective chebi website

data/${chebi_version}/${dataset_name}/raw/

Contains the processed data with SMILES strings and class columns with boolean values, stored in .pkl format, along with classes.txt file containing the list of classes for the data

data/${chebi_version}/${dataset_name}/processed/

Contains the processed data in .pt format which is compatible with the torch library

data/${chebi_version}/${dataset_name}/processed/${reader_name}/

${dataset_name} represents the _name attribute of the DataModule used.
${chebi_version} refers to the ChEBI version.
${reader_name} denotes the name attribute of the associated Reader class.

For cross-validation, the folds are stored as cv_${n_folds}_fold/fold_{fold_index}_train.pkl and cv_${n_folds}_fold/fold_{fold_index}_validation.pkl in the raw directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Management

Loading ChEBI Ontology Data

ChEBI versions

Data Preprocessing

Data folder structure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally