Skip to content

Commit ccbeb74

Browse files
authored
Merge pull request #9 from ChEB-AI/feature/api_downloadble_models
Fix - Api Models
2 parents b802132 + 0d517f4 commit ccbeb74

File tree

4 files changed

+33
-42
lines changed

4 files changed

+33
-42
lines changed

README.md

Lines changed: 29 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ A web application for the ensemble is available at https://chebifier.hastingslab
88

99
Not all models can be installed automatically at the moment:
1010
- `chebai-graph` and its dependencies. To install them, follow
11-
the instructions in the [chebai-graph repository](https://github.com/ChEB-AI/python-chebai-graph).
11+
the instructions in the [chebai-graph repository](https://github.com/ChEB-AI/python-chebai-graph).
1212
- `chemlog-extra` can be installed with `pip install git+https://github.com/ChEB-AI/chemlog-extra.git`
13-
- The automatically installed version of `c3p` may not work under Windows. If you want to run chebifier on Windows, we
13+
- The automatically installed version of `c3p` may not work under Windows. If you want to run chebifier on Windows, we
1414
recommend using this forked version: `pip install git+https://github.com/sfluegel05/c3p.git`
1515

1616

@@ -38,11 +38,26 @@ The package provides a command-line interface (CLI) for making predictions using
3838
The ensemble configuration is given by a configuration file (by default, this is `chebifier/ensemble.yml`). If you
3939
want to change which models are included in the ensemble or how they are weighted, you can create your own configuration file.
4040

41-
Model weights for deep learning models are downloaded automatically from [Hugging Face](https://huggingface.co/chebai).
41+
Model weights for deep learning models are automatically downloaded from [Hugging Face](https://huggingface.co/chebai).
42+
To use specific model weights from Hugging face, add the `load_model` key in your configuration file. For example:
43+
44+
```yaml
45+
my_electra:
46+
type: electra
47+
load_model: "electra_chebi50_v241"
48+
```
49+
50+
### Available model weights:
51+
52+
* `electra_chebi50_v241`
53+
* `resgated_chebi50_v241`
54+
* `c3p_with_weights`
55+
56+
4257
However, you can also supply your own model checkpoints (see `configs/example_config.yml` for an example).
4358

4459
```bash
45-
# Make predictions
60+
# Make predictions
4661
python -m chebifier predict --smiles "CC(=O)OC1=CC=CC=C1C(=O)O" --smiles "C1=CC=C(C=C1)C(=O)O"
4762
4863
# Make predictions using SMILES from a file
@@ -96,7 +111,7 @@ Currently, the following models are supported:
96111
| `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470) | [c3p](https://github.com/chemkg/c3p) |
97112

98113
In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class
99-
matched by a SMILES string. This is not activated by default, but can be included by adding
114+
matched by a SMILES string. This is not activated by default, but can be included by adding
100115
```yaml
101116
chebi_lookup:
102117
type: chebi_lookup
@@ -109,15 +124,15 @@ to your configuration file.
109124

110125
Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows:
111126
1. Get predictions from each model $m_i$ for the sample.
112-
2. For each class $c$, aggregate predictions $p_c^{m_i}$ from all models that made a prediction for that class.
127+
2. For each class $c$, aggregate predictions $p_c^{m_i}$ from all models that made a prediction for that class.
113128
The aggregation happens separately for all positive predictions (i.e., $p_c^{m_i} \geq 0.5$) and all negative predictions
114129
($p_c^{m_i} < 0.5$). If the aggregated value is larger for the positive predictions than for the negative predictions,
115130
the ensemble makes a positive prediction for class $c$:
116131

117132
<img width="2297" height="114" alt="image" src="https://github.com/user-attachments/assets/2f0263ae-83ac-41ea-938a-c71b46082c22" />
118133
<!-- For some reason, this formula does not render in GitHub markdown. Therefore, I rendered it locally and added it as an image. The rendered formula is:
119134
$$
120-
\text{ensemble}(c) = \begin{cases}
135+
\text{ensemble}(c) = \begin{cases}
121136
1 & \text{if } \sum_{i: p_c^{m_i} \geq 0.5} [\text{confidence}_c^{m_i} \cdot \text{model_weight}_{m_i} \cdot \text{trust}_c^{m_i}] > \sum_{i: p_c^{m_i} < 0.5} [\text{confidence}_c^{m_i} \cdot \text{model_weight}_{m_i} \cdot \text{trust}_c^{m_i}] \\
122137
0 & \text{otherwise}
123138
\end{cases}
@@ -135,25 +150,25 @@ Therefore, if in doubt, we are more confident in the negative prediction.
135150

136151
Confidence can be disabled by the `use_confidence` parameter of the predict method (default: True).
137152

138-
The model_weight can be set for each model in the configuration file (default: 1). This is used to favor a certain
139-
model independently of a given class.
140-
Trust is based on the model's performance on a validation set. After training, we evaluate the Machine Learning models
153+
The model_weight can be set for each model in the configuration file (default: 1). This is used to favor a certain
154+
model independently of a given class.
155+
Trust is based on the model's performance on a validation set. After training, we evaluate the Machine Learning models
141156
on a validation set for each class. If the `ensemble_type` is set to `wmv-f1`, the trust is calculated as 1 + the F1 score.
142157
If the `ensemble_type` is set to `mv` (the default), the trust is set to 1 for all models.
143158

144159
### Inconsistency resolution
145-
After a decision has been made for each class independently, the consistency of the predictions with regard to the ChEBI hierarchy
160+
After a decision has been made for each class independently, the consistency of the predictions with regard to the ChEBI hierarchy
146161
and disjointness axioms is checked. This is
147162
done in 3 steps:
148-
- (1) First, the hierarchy is corrected. For each pair of classes $A$ and $B$ where $A$ is a subclass of $B$ (following
149-
the is-a relation in ChEBI), we set the ensemble prediction of $B$ to 1 if the prediction of $A$ is 1. Intuitively
163+
- (1) First, the hierarchy is corrected. For each pair of classes $A$ and $B$ where $A$ is a subclass of $B$ (following
164+
the is-a relation in ChEBI), we set the ensemble prediction of $B$ to 1 if the prediction of $A$ is 1. Intuitively
150165
speaking, if we have determined that a molecule belongs to a specific class (e.g., aromatic primary alcohol), it also
151166
belongs to the direct and indirect superclasses (e.g., primary alcohol, aromatic alcohol, alcohol).
152167
- (2) Next, we check for disjointness. This is not specified directly in ChEBI, but in an additional ChEBI module ([chebi-disjoints.owl](https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/)).
153168
We have extracted these disjointness axioms into a CSV file and added some more disjointness axioms ourselves (see
154169
`data>disjoint_chebi.csv` and `data>disjoint_additional.csv`). If two classes $A$ and $B$ are disjoint and we predict
155170
both, we select one with the higher class score and set the other to 0.
156-
- (3) Since the second step might have introduced new inconsistencies into the hierarchy, we repeat the first step, but
171+
- (3) Since the second step might have introduced new inconsistencies into the hierarchy, we repeat the first step, but
157172
with a small change. For a pair of classes $A \subseteq B$ with predictions $1$ and $0$, instead of setting $B$ to $1$,
158173
we now set $A$ to $0$. This has the advantage that we cannot introduce new disjointness-inconsistencies and don't have
159174
to repeat step 2.

chebifier/ensemble/base_ensemble.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33

44
import torch
55
import tqdm
6-
from chebifier.inconsistency_resolution import PredictionSmoother
7-
from chebifier.utils import load_chebi_graph, get_disjoint_files
86

97
from chebifier.check_env import check_package_installed
8+
from chebifier.hugging_face import download_model_files
9+
from chebifier.inconsistency_resolution import PredictionSmoother
1010
from chebifier.prediction_models.base_predictor import BasePredictor
11+
from chebifier.utils import get_disjoint_files, load_chebi_graph
1112

1213

1314
class BaseEnsemble:
14-
1515
def __init__(
1616
self,
1717
model_configs: dict,
@@ -29,8 +29,6 @@ def __init__(
2929
for model_name, model_config in model_configs.items():
3030
model_cls = MODEL_TYPES[model_config["type"]]
3131
if "hugging_face" in model_config:
32-
from chebifier.hugging_face import download_model_files
33-
3432
hugging_face_kwargs = download_model_files(model_config["hugging_face"])
3533
else:
3634
hugging_face_kwargs = {}

configs/example_config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
chemlog_peptides:
3-
type: chemlog
3+
type: chemlog_peptides
44
model_weight: 100 # if chemlog is available, it always gets chosen
55
my_resgated:
66
type: resgated

configs/huggingface_config.yml

Lines changed: 0 additions & 22 deletions
This file was deleted.

0 commit comments

Comments
 (0)