You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -96,7 +111,7 @@ Currently, the following models are supported:
96
111
| `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470) | [c3p](https://github.com/chemkg/c3p) |
97
112
98
113
In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class
99
-
matched by a SMILES string. This is not activated by default, but can be included by adding
114
+
matched by a SMILES string. This is not activated by default, but can be included by adding
100
115
```yaml
101
116
chebi_lookup:
102
117
type: chebi_lookup
@@ -109,15 +124,15 @@ to your configuration file.
109
124
110
125
Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows:
111
126
1. Get predictions from each model $m_i$ for the sample.
112
-
2. For each class $c$, aggregate predictions $p_c^{m_i}$ from all models that made a prediction for that class.
127
+
2. For each class $c$, aggregate predictions $p_c^{m_i}$ from all models that made a prediction for that class.
113
128
The aggregation happens separately for all positive predictions (i.e., $p_c^{m_i} \geq 0.5$) and all negative predictions
114
129
($p_c^{m_i} < 0.5$). If the aggregated value is larger for the positive predictions than for the negative predictions,
115
130
the ensemble makes a positive prediction for class $c$:
<!-- For some reason, this formula does not render in GitHub markdown. Therefore, I rendered it locally and added it as an image. The rendered formula is:
@@ -135,25 +150,25 @@ Therefore, if in doubt, we are more confident in the negative prediction.
135
150
136
151
Confidence can be disabled by the `use_confidence` parameter of the predict method (default: True).
137
152
138
-
The model_weight can be set for each model in the configuration file (default: 1). This is used to favor a certain
139
-
model independently of a given class.
140
-
Trust is based on the model's performance on a validation set. After training, we evaluate the Machine Learning models
153
+
The model_weight can be set for each model in the configuration file (default: 1). This is used to favor a certain
154
+
model independently of a given class.
155
+
Trust is based on the model's performance on a validation set. After training, we evaluate the Machine Learning models
141
156
on a validation set for each class. If the `ensemble_type` is set to `wmv-f1`, the trust is calculated as 1 + the F1 score.
142
157
If the `ensemble_type` is set to `mv` (the default), the trust is set to 1 for all models.
143
158
144
159
### Inconsistency resolution
145
-
After a decision has been made for each class independently, the consistency of the predictions with regard to the ChEBI hierarchy
160
+
After a decision has been made for each class independently, the consistency of the predictions with regard to the ChEBI hierarchy
146
161
and disjointness axioms is checked. This is
147
162
done in 3 steps:
148
-
- (1) First, the hierarchy is corrected. For each pair of classes $A$ and $B$ where $A$ is a subclass of $B$ (following
149
-
the is-a relation in ChEBI), we set the ensemble prediction of $B$ to 1 if the prediction of $A$ is 1. Intuitively
163
+
- (1) First, the hierarchy is corrected. For each pair of classes $A$ and $B$ where $A$ is a subclass of $B$ (following
164
+
the is-a relation in ChEBI), we set the ensemble prediction of $B$ to 1 if the prediction of $A$ is 1. Intuitively
150
165
speaking, if we have determined that a molecule belongs to a specific class (e.g., aromatic primary alcohol), it also
151
166
belongs to the direct and indirect superclasses (e.g., primary alcohol, aromatic alcohol, alcohol).
152
167
- (2) Next, we check for disjointness. This is not specified directly in ChEBI, but in an additional ChEBI module ([chebi-disjoints.owl](https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/)).
153
168
We have extracted these disjointness axioms into a CSV file and added some more disjointness axioms ourselves (see
154
169
`data>disjoint_chebi.csv`and `data>disjoint_additional.csv`). If two classes $A$ and $B$ are disjoint and we predict
155
170
both, we select one with the higher class score and set the other to 0.
156
-
- (3) Since the second step might have introduced new inconsistencies into the hierarchy, we repeat the first step, but
171
+
- (3) Since the second step might have introduced new inconsistencies into the hierarchy, we repeat the first step, but
157
172
with a small change. For a pair of classes $A \subseteq B$ with predictions $1$ and $0$, instead of setting $B$ to $1$,
158
173
we now set $A$ to $0$. This has the advantage that we cannot introduce new disjointness-inconsistencies and don't have
0 commit comments