You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`electra`| A transformer-based deep learning model trained on ChEBI SMILES strings. | 1522 |[Glauer, Martin, et al., 2024: Chebifier: Automating semantic classification in ChEBI to accelerate data-driven discovery, Digital Discovery 3 (2024) 896-907](https://pubs.rsc.org/en/content/articlehtml/2024/dd/d3dd00238a)|[python-chebai](https://github.com/ChEB-AI/python-chebai)|
88
+
|`resgated`| A Residual Gated Graph Convolutional Network trained on ChEBI molecules. | 1522 ||[python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph)|
89
+
|`chemlog_peptides`| A rule-based model specialised on peptide classes. | 18 |[Flügel, Simon, et al., 2025: ChemLog: Making MSOL Viable for Ontological Classification and Learning, arXiv](https://arxiv.org/abs/2507.13987)|[chemlog-peptides](https://github.com/sfluegel05/chemlog-peptides)|
90
+
|`chemlog_element`, `chemlog_organox`| Extensions of ChemLog for classes that are defined either by the presence of a specific element or by the presence of an organic bond. | 118 + 37 ||[chemlog-extra](https://github.com/ChEB-AI/chemlog-extra)|
91
+
|`c3p`| A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 |[Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470)|[c3p](https://github.com/chemkg/c3p)|
92
+
93
+
In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class
94
+
matched by a SMILES string. This is not activated by default, but can be included by adding
Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows:
73
106
1. Get predictions from each model $m_i$ for the sample.
@@ -103,7 +136,7 @@ Trust is based on the model's performance on a validation set. After training, w
103
136
on a validation set for each class. If the `ensemble_type` is set to `wmv-f1`, the trust is calculated as 1 + the F1 score.
104
137
If the `ensemble_type` is set to `mv` (the default), the trust is set to 1 for all models.
105
138
106
-
### Inconsistency correction
139
+
### Inconsistency resolution
107
140
After a decision has been made for each class independently, the consistency of the predictions with regard to the ChEBI hierarchy
108
141
and disjointness axioms is checked. This is
109
142
done in 3 steps:
@@ -114,7 +147,7 @@ belongs to the direct and indirect superclasses (e.g., primary alcohol, aromatic
114
147
- (2) Next, we check for disjointness. This is not specified directly in ChEBI, but in an additional ChEBI module ([chebi-disjoints.owl](https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/)).
115
148
We have extracted these disjointness axioms into a CSV file and added some more disjointness axioms ourselves (see
116
149
`data>disjoint_chebi.csv`and `data>disjoint_additional.csv`). If two classes $A$ and $B$ are disjoint and we predict
117
-
both, we select one of them randomly (https://github.com/ChEB-AI/python-chebifier/issues/6) and set the other to 0.
150
+
both, we select one with the higher class score and set the other to 0.
118
151
- (3) Since the second step might have introduced new inconsistencies into the hierarchy, we repeat the first step, but
119
152
with a small change. For a pair of classes $A \subseteq B$ with predictions $1$ and $0$, instead of setting $B$ to $1$,
120
153
we now set $A$ to $0$. This has the advantage that we cannot introduce new disjointness-inconsistencies and don't have
0 commit comments