-
Notifications
You must be signed in to change notification settings - Fork 44
Reimplement NN ensemble using PyTorch #926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5bdbf64 to
d82a54a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #926 +/- ##
==========================================
- Coverage 99.63% 99.63% -0.01%
==========================================
Files 103 103
Lines 8238 8226 -12
==========================================
- Hits 8208 8196 -12
Misses 30 30 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d82a54a to
da479eb
Compare
|
Selecting of the PyTorch variant (CPU or CUDA x.y or ROCm or...) when setting up the development environment using The problem is that But fortunately, it is possible to have some degree of control over the resolution by setting up "extras" and then declaring a "conflict" between them. This causes uv to "fork" the resolution into different "branches", each having their own dependency tree. So in commit e629963, I added two new extras: The end result is that these two extras can be used to select the PyTorch variant at Here are examples of how this works now: 1.
|
|
I refined the above solution by adding an Maybe not ideal, but it works. |
|
I ran benchmarking runs using Annif-tutorial YSO-NLF dataset on annif-data-kk server (it has 6 CPUs). The used script and output data are in the benchmarking branch train
eval
Compared to TensorFlow implementation PyTorch requires twice as much memory in training and is slightly slower (107% in usertime); but in inference the situation is the opposite: PyTorch is faster (~98%) and takes less memory. |
|
Thanks @juhoinkinen ! The RAM usage doubling is interesting. First hypothesis: Maybe PT uses higher precision floats than TF? I'll investigate. |
|
The increase in memory use during training was mainly due to the way nDCG scores were calculated, which caused a lot of large tensors to be kept in memory especially towards the end of a training epoch. I switched away from the torchmetrics implementation, instead implementing the calculation with a custom function that doesn't keep the tensors allocated. |
|
I re-run the benchmarks, full output here. Memory usage during training is now even lower than it was with TensorFlow! train
eval
|
|
|
Data for the latest change with Conv1d: train
eval
So nDCG improved 0.0224 (and F1@5 0.0127)! 📈 No adverse effects on performance or memory usage. |



This PR reimplements the NN ensemble using PyTorch instead of Keras/TensorFlow.
To test this, you will have to use
uv sync --group all --extra torch-cpuor similar (see comments below).Some notes about the implementation:
top_k_categorical_accuracy, but this was not easily available in PyTorch, so I switched to using the nDCG metric from the torchmetrics packagetorch-cpuandtorch-cu128extras for now, but I think the setup could quite easily be extended to other PyTorch variants such as CUDA 12.6 or 13.0, ROCm or Intel XPU, though obviously these would require more configuration inpyproject.tomlI have not yet measured how well this performs in terms of quality, computational performance or memory usage.
Fixes #895