Add ebm backend #914

mfakaehler · 2025-11-11T12:57:38Z

Dear Annif-Team,

As announced in issue #855 we would like to propose a new backend for annif Embedding Based Matching (EBM) that has been created by @RietdorfC and myself.

Here is a first draft for a readme article, to be added to the wiki:
Backend-EBM.md
Looking forward to your feedback!

Best,
Maximilian

codecov · 2025-11-11T13:01:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.64%. Comparing base (27e4ac7) to head (e57ef3a).

Additional details and impacted files

@@           Coverage Diff            @@
##             main     #914    +/-   ##
========================================
  Coverage   99.63%   99.64%            
========================================
  Files         103      105     +2     
  Lines        8238     8398   +160     
========================================
+ Hits         8208     8368   +160     
  Misses         30       30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

osma · 2025-11-11T13:19:51Z

Thanks, this is great! A couple of quick suggestions:

This PR is already 17 commits even though there is not that much code in it. Maybe some or even all of them could be squashed and/or rebased against current main to reduce the number of commits?
The ebm related tests (thanks for making tests!) are not yet running under GitHub Actions CI. To enable that, you should add the optional dependency ebm to one or more of the Python version specific poetry install commands here. For example it could be added to Python 3.13 and/or 3.11.

mfakaehler · 2025-11-12T09:18:32Z

We have identified that the errors in the check are likely related to the download of the default sentenceTransformer model that we configured (BAAI/bge-m3). That fetches 8GB of additional data. We will look into that

juhoinkinen · 2025-11-14T08:46:47Z

The GitHub Actions job for testing on Python 3.11 fails because there is not enough disk space, logs:

   ...
  - Installing schemathesis (3.39.16)
  - Installing simplemma (1.1.2)
  - Installing spacy (3.8.9)
  OSError
  [Errno 28] No space left on device
  at ~/.cache/pipx/venvs/poetry/lib/python3.10/site-packages/installer/utils.py:140 in copyfileobj_with_hashing
      136│         buf = source.read(_COPY_BUFSIZE)
      137│         if not buf:
      138│             break
      139│         hasher.update(buf)
    → 140│         dest.write(buf)
      141│         size += len(buf)
      142│ 
      143│     return base64.urlsafe_b64encode(hasher.digest()).decode("ascii").rstrip("="), size
      144│ 
Cannot install estnltk.

I think we could use a larger GH Actions runner machine, but its setup is an organization wide setting and it will be billed by usage, so I'll need to check this from our admins (takes less than day I hope). The specs for the larger GitHub hosted machines are these:

Ubuntu (22.04)
ubuntu-latest-m, 4-cores · 14 GB RAM · 150 GB SSD · $0.016 per minute

Windows (2022)
windows-latest-l, 8-cores · 32 GB RAM · 300 GB SSD · $0.064 per minute

Edit: The default runners have 14 GB disk and there is actually more options for the large runners.

juhoinkinen · 2025-11-14T08:51:23Z

Alternatively we could remove some unnecessary stuff from the runner, or distribute the optional dependencies to install to separate jobs. But that could be done in another PR, to keep this one simple.

juhoinkinen · 2025-11-14T12:09:08Z

A larger runner is ready. This is the way how to make CI to use it: Run job on GH hosted large-runner

An example run: https://github.com/NatLibFi/Annif/actions/runs/19363594070/job/55401126360

Edit: But as discussed with @osma, it would be better that installing would not require so much disk space and network traffic. Maybe the installation could somehow be slimmed?

mfakaehler · 2025-11-14T12:40:33Z

Thanks @juhoinkinen for expanding the github runners. We have already tried to reduce traffic by skipping the download of the default sentenceTransformer model that we configured for ebm. It now onlies uses a mock model in the tests, so that the Hugging Face Cache should remain empty. I am not sure how we could trim down the installation. Installing sentenceTransformers is essential for the package and that brings in all the other heavy libraries (transformers, torch, etc.). As we now only work with a mock modelin the tests, maybe one could run the tests without actually installing sentenceTransformers, for example install ebm with --no-deps option. However, one reason for having this CI is that one would want to test that the library can be properly installed, including all dependencies, isn't it?
We are certainly open for suggestions. Did you have anything in mind, when you suggested trimming the installation?

osma · 2025-11-14T12:59:27Z

Good that you were able to avoid downloading the sentence transformer model and replaced it with a mock implementation.

I think that installing dependencies (software libraries) is essential for a CI pipeline like this. It's a bit sad that these libraries are so huge. Here are a couple of ideas for slimming down the installation:

Would it be possible to switch to another Torch variant? In my understanding, the default variant uses CUDA and is pretty huge. There is no CUDA support in CI anyway, so this seems like a waste. A CPU-only variant would be a lot slimmer, I think?
Currently ebm is installed together with other optional dependencies (fasttext spacy estnltk ebm for Python 3.11 and fasttext yake voikko spacy ebm for 3.13). We could try to adjust these so that not all big dependencies are used in the same environment. Though I doubt there is that much room for improvement here.

The different Pytorch variants are described in the documentation. For CPU-only, you need to pass --index-url https://download.pytorch.org/whl/cpu to the pip install command. (Not sure how to do that with our current Poetry setup)

It looks like Pytorch is also developing new wheel variants that auto-detect the hardware. Not sure if this is ready for this kind of use yet.

mfakaehler · 2025-11-14T13:00:19Z

Is there a way to figure out these problems in ci/cd in a local mode, e.g. using some docker images, so that we don't need to burn down NatLibFi's ressources? I must admit I am very much unexperienced with github actions.

osma · 2025-11-14T13:04:40Z

@mfakaehler there are ways to run GitHub Actions locally, for example https://github.com/nektos/act
I have no experience with that though.

Please don't worry about the costs. They are really peanuts, and we are very interested in getting the ebm backend working.

sonarqubecloud · 2025-11-14T13:15:03Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mfakaehler · 2025-11-18T09:01:45Z

I have been thinking about the CPU-only installation. Before we start to re-program our dependencies, I would like to discuss, what we are aiming for. Would you like to deploy the EBM backend without GPU support entirely, to reduce the size of the installation? Or are we talking about ways to "cheat" the CI-pipeline to only download the pytorch CPU installation, but still enable other users to use GPUs?
Personally, I would recommend to have easy GPU-support for users, because the process of embedding generation is really the bottleneck of EBM's runtime. So if users have the appropirate hardware they should be enabled to use it.

juhoinkinen · 2025-11-18T09:19:03Z

Just a thought, but how much work would it require, or is it possible at all, to have an option (probably in emb4subjects) to generate the embeddings in an external service instead of in the local machine?

Currently in our deployment setup via are running Annif in OpenShift cluster having just CPUs, but we have GPUs available in a separate cluster.

osma · 2025-11-18T09:20:53Z

@mfakaehler Excellent questions!

This is a bit similar to the discussion in #804 about Docker image variants (mainly related to the XTransformer backend). There, the conclusion was that it doesn't make sense to include GPU support in Docker images (at least in the primary image variant), because it would increase its size a lot and still be difficult to run.

But XTransformer is a bit different than EBM in that it only requires a GPU during training, not so much at inference time. In my understanding, EBM in practice needs a GPU at inference time as well. So we can't just apply the same logic directly.

Here are some things that I think would be desirable:

It should be possible to use Annif (with EBM) without installing GPU dependencies (even if it's slow). This would be especially desirable in the case of GitHub Actions CI because CI jobs run very often so they should be as lightweight as possible and complete as fast as possible.
It should be possible to use different brands of GPUs, not just limited to NVIDIA/CUDA but also AMD/ROCm and possibly others (e.g. Vulkan) if PyTorch has support.

I realize that these may be difficult to achieve in our current way of managing dependencies. In my understanding, Poetry only has limited support for PyTorch variants making it difficult to implement flexible choices. I think uv has better support - see e.g. here. So we could consider switching to uv if it helps in this area.

Does the GPU inference to calculate embeddings have to happen in the same Annif process or could it be in an external service accessible via an API? For example commercial LLM providers (OpenAI, Anthropic/Claude etc.) have embedding APIs for RAG and similar applications, and also locally run LLM engines such as llama.cpp and Ollama provide embedding APIs.

mfakaehler · 2025-11-18T11:11:05Z

Thank you both for your thoughts. I think it should be possible to implement the embedding generation also as API calls to an external service like llama.cpp or Hugging Face TEI. I suspect that we might loose some efficiency in contrast to "offline inference" in terms of batch processing. Also we add the complexity of setting up the inference engine to the users burden. So we have fewer python dependencies, but more docker dependencies, thinking of the additional container(s) that will need to run in production services. Offline inference only actually takes up GPU resources when called upon. Online services need to run permanently and consume power and memory in idle mode. Maybe this is probably a good point in time to make decisions for the future, e.g. when we also envision generative models in future backends like llmensemble. If sourcing model inference to external services is the path that you prefer, I am sure we can accommodate that. I see benefits on both ends.

…

________________________________ Von: Osma Suominen ***@***.***> Gesendet: Dienstag, 18. November 2025 10:21:15 An: NatLibFi/Annif Cc: Kähler, Maximilian; Mention Betreff: Re: [NatLibFi/Annif] Add ebm backend (PR #914) [https://avatars.githubusercontent.com/u/1132830?s=20&v=4]osma left a comment (NatLibFi/Annif#914)<#914 (comment)> @mfakaehler<https://github.com/mfakaehler> Excellent questions! This is a bit similar to the discussion in #804<#804> about Docker image variants (mainly related to the XTransformer backend). There, the conclusion was that it doesn't make sense to include GPU support in Docker images (at least in the primary image variant), because it would increase its size a lot and still be difficult to run. But XTransformer is a bit different than EBM in that it only requires a GPU during training, not so much at inference time. In my understanding, EBM in practice needs a GPU at inference time as well. So we can't just apply the same logic directly. Here are some things that I think would be desirable: 1. It should be possible to use Annif (with EBM) without installing GPU dependencies (even if it's slow). This would be especially desirable in the case of GitHub Actions CI because CI jobs run very often so they should be as lightweight as possible and complete as fast as possible. 2. It should be possible to use different brands of GPUs, not just limited to NVIDIA/CUDA but also AMD/ROCm and possibly others (e.g. Vulkan) if PyTorch has support. I realize that these may be difficult to achieve in our current way of managing dependencies. In my understanding, Poetry only has limited support for PyTorch variants making it difficult to implement flexible choices. I think uv has better support - see e.g. here<https://docs.astral.sh/uv/guides/integration/pytorch/>. So we could consider switching to uv if it helps in this area. Does the GPU inference to calculate embeddings have to happen in the same Annif process or could it be in an external service accessible via an API? For example commercial LLM providers (OpenAI, Anthropic/Claude etc.) have embedding APIs for RAG and similar applications, and also locally run LLM engines such as llama.cpp and Ollama provide embedding APIs. — Reply to this email directly, view it on GitHub<#914 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOLM2JLFB4QJJVG636YFP3D35LQIXAVCNFSM6AAAAACLYUZPPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNBWGQYDAOBYGU>. You are receiving this because you were mentioned.Message ID: ***@***.***>

osma · 2025-11-18T12:08:56Z

@mfakaehler Good points. I don't think supporting an external embedding API would solve our problems, even though it would be a nice feature for EBM (as Juho implied, sometimes it's easier to separate the GPU-dependent parts of a system into its own environment).

mfakaehler · 2025-11-20T15:32:17Z

Just to let you know: there was a public holiday in our region of Germany, which is why I cannot currently discuss thiss with Clemens and Christoph. Meanwhile, I will try to find some example packages, where pytorch is implicitly imported with another high-level library like transformers, and see how others deal with the complexity of pytorch installs. It feels a bit odd to handle that in the ebm4subjects package, as it never explicitly imports pytorch. So I feel inclined to leave that piece of environment management for the user. So if a user needs a particular pytorch install, e.g. with AMD/ROCm support, they would need to install that on their own before installing everything else.

However, If we don't find anything more elegant, we will provide optional dependencies for emb4subject in two or three differrent flavours, e.g.:

ebm-bare: no pytorch, but the possibility of generating embeddings over API access,
ebm-cpu: ebm with dependencies routing to pytorch-cpu
ebm-gpu: ebm with dependencies routing to default pytorch install

Annif could then offer equal flavours of ebm and we could resolve to only importing ebm-bare in the CI pipeline.

osma · 2025-11-20T16:05:56Z

Thanks @mfakaehler , I think it's a good idea to look at how other packages handle this.

I must say I'm tempted by the special support for PyTorch that uv provides. It would e.g. allow specifying the PyTorch variant at install time, something like this:

uv pip install annif[ebm] --torch-backend=cu126

or

uv pip install annif[ebm] --torch-backend=cpu

If we want to make use of that for Annif itself, we would have to switch from Poetry to uv, which is probably not trivial but should be doable. We have already switched dependency management systems several times in Annif history: I think we started with pipenv, then switch to plain pip+venv, and more recently have been using Poetry.

mfakaehler · 2025-11-21T07:07:26Z

We already experimented with uv for the ebm4subjects package. In an earlier version we wanted to support a more complex default embedding model (jina-ai), which needs a dependency called flash-attention, and uv provided the functionality to have it installed with the "no-build-isolation" flag, which poetry could not support. So yes, uv has some helpful advanced features. That would certainly be helpful to have.

osma · 2025-12-18T12:11:37Z

@mfakaehler you may want to check out PR #923 where I've experimented with switching to uv. Any comments are very welcome! uv looks promising, but I don't think we've made a firm decision to switch yet.

mfakaehler · 2025-12-18T12:23:52Z

Thanks. That looks good. @RietdorfC meanwhile analyzed the size of the venv you get, when installing our ebm4subjects standalone (without annif):

uv pip install ebm4subjects leads to a venv of 7,5 GB
uv pip install ebm4subjects --torch-backend=cpu leads to a venv mit 2,0 GB

We'll report on progress with the discussed changes to ebm at some other time. This is only to confirm, that a switch to uv with the appropriate install flags would indeed help to control the environment size.

mfakaehler · 2025-12-19T10:29:23Z

Dear Annif-Team,
before everyone switches to seasonal hibernation, I would like to wrap up our current status in this PR:

@RietdorfC has implemented an abstraction of the embedding generation process, that allows choosing between offline-inference (loading the model in memory in the same process with the sentence-transformer library) or calling the HuggingFaceTEI-API to collect embeddings from an externally set up service. We will push that change soon (early January I'd guess)
we still need to make a decision which of these options should be supported with the default installation of the ebm4subjects package
- Option a): sentence-transformers (and the whole pytorch dependency chain) is installed by default, allowing to use the package without setting up or acquiring access to an external API. This comes with the cost of the larger installation size
- Option b): ebm4subjects supports by default only the API-calling variant, allowing for a minimal installation. We offer the extended variant ebm4subjects[offline-inference] with sentence-transformers included as optional extension. However, this means the backend will be disfunctional without the external API, when only installing emb4subjects.

In case b), Annif could offer two flavours of the backend during installation:

[project.optional-dependencies]
fasttext = ["fasttext-numpy2==0.10.4"]
ebm_api = ["ebm4subjects"]
ebm_offline_inference = ["ebm4subjects[offline_inference]"]

Which option would you prefer? Making the sentenceTranformer-Support optional or shipping it with the base package of ebm4subjects? And how would you like to have that tested in the CI-Pipeline? Testing the offline-inference variant will always lead to bundeling pytorch and sentence-transformer in the installation, and thus increase the container size.

Whishing you all the best for the holidays!

mfakaehler · 2026-01-14T10:17:00Z

We have updated ebm4subjects to support embedding generation through the HuggingfaceTEI API or an OpenAI type API.
This updated README reflects these changes.
Backend-EBM.md
In this current package version of ebm4subjects the SentenceTransformer library will still be installed by default.
We can discuss options and next steps at our meeting tomorrow.

osma · 2026-01-15T10:27:01Z

I just merged the uv PR to main, and unfortunately this caused a conflict with this PR. You will have to resolve it, sorry for the trouble.

osma · 2026-01-15T15:01:01Z

I think I found a solution to the question of how to select the PyTorch variant (CPU or GPU) when using uv sync. Not super elegant but it seems to work. See #926 (comment)

mfakaehler · 2026-01-16T11:19:40Z

Perfect. This should work similarly for ebm, with SentenceTransformer included. We will try it out.

mfakaehler · 2026-01-16T11:29:28Z

One detail that I just came across. How should we handle the Huggingface Cache? This will also be relevant for the X-Transformer. Should we have it pointing to somewhere in the project directory? Otherwise, it will go to the default .cache/huggingface in the user workspace. In an HPC-Setting, where the user workspace is not writable during job procssing, this could crash your process.

osma · 2026-01-16T11:47:23Z

@mfakaehler I think the HuggingFace cache can be controlled using environment variables? In that case, I would simply point this out in the (admittedly already long!) documentation rather than trying to override the defaults.

mfakaehler · 2026-01-16T13:29:41Z

Yes, one can usually control it with the HF_HOME environment variable. I will add that to the docs. Thank you for your input.

* enable embedding generator through HuggingfaceTEI or OpenAI compatible API --------- Co-authored-by: Clemens Rietdorf <[email protected]>

sonarqubecloud · 2026-01-21T14:37:58Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

RietdorfC · 2026-01-22T09:19:14Z

Hi @osma and @juhoinkinen,

Just a brief update on the ebm4subjects package, because the changes i made there are not directly visible in the latest commits made here. I took your feedback and worked it into the package (thanks a lot for the detailed feedback via mail @juhoinkinen). In detail, this means:

We renamed the model_deployment option "offline-inference" to "in-process"
For the model_deployment option "in-process" there is now a debug log output which states on which device the embedding generation process is running
For the API variants of the embedding generation process there is now a test call to the API when the EmbeddingGenerator gets initialised that checks whether the given endpoint is available with the given API key. If not the process stops with an error message and otherwise there is a debug log output.
The EmbeddingGeneratorOpenAI variant of the EmbeddingGenerator now uses the openai library and the API key (if existing) is taken from the corresponding enviroment varibale OPENAI_API_KEY
There are now two variants of the ebm4subjects package. The default variant comes without the sentence_transformers library and has roughly a size of 1.3 GB and the ebm4subjects[in-process] variant comes with the sentence_transfomers library which leads to a size of ca. 7.7 GB

Best regards
Clemens

mfakaehler changed the title ~~Issue855 add ebm backend~~ Add ebm backend Nov 11, 2025

mfakaehler force-pushed the issue855-add-ebm-backend branch 2 times, most recently from 375b999 to f010704 Compare November 12, 2025 09:04

osma mentioned this pull request Dec 1, 2025

Consider switching to uv for dependency management (esp. PyTorch) #919

Closed

osma mentioned this pull request Dec 17, 2025

Use uv for managing dependencies #923

Merged

mfakaehler and others added 7 commits January 19, 2026 11:42

Add ebm backend

d6f4051

fix unittests

07134a3

add unit tests for missing ebm db and ebm not installed

ded3638

change order of unit tests

b0daea9

Run job on GH hosted large-runner

136b946

fix ebm not installed unit test

fad66c0

support embedding generation by API

b28595f

* enable embedding generator through HuggingfaceTEI or OpenAI compatible API --------- Co-authored-by: Clemens Rietdorf <[email protected]>

mfakaehler force-pushed the issue855-add-ebm-backend branch from f5efd02 to b28595f Compare January 19, 2026 10:49

RietdorfC added 3 commits January 19, 2026 12:13

add ebm tests to cicd.yml

3f6c4c4

change deployment defaults

f914b4e

fix typing error

e57ef3a

Add ebm backend #914

Are you sure you want to change the base?

Add ebm backend #914

Uh oh!

Conversation

mfakaehler commented Nov 11, 2025

Uh oh!

codecov bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

osma commented Nov 11, 2025

Uh oh!

mfakaehler commented Nov 12, 2025

Uh oh!

juhoinkinen commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juhoinkinen commented Nov 14, 2025

Uh oh!

juhoinkinen commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfakaehler commented Nov 14, 2025

Uh oh!

osma commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfakaehler commented Nov 14, 2025

Uh oh!

osma commented Nov 14, 2025

Uh oh!

sonarqubecloud bot commented Nov 14, 2025

Quality Gate passed

Uh oh!

mfakaehler commented Nov 18, 2025

Uh oh!

juhoinkinen commented Nov 18, 2025

Uh oh!

osma commented Nov 18, 2025

Uh oh!

mfakaehler commented Nov 18, 2025 via email

Uh oh!

osma commented Nov 18, 2025

Uh oh!

mfakaehler commented Nov 20, 2025

Uh oh!

osma commented Nov 20, 2025

Uh oh!

mfakaehler commented Nov 21, 2025

Uh oh!

osma commented Dec 18, 2025

Uh oh!

mfakaehler commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfakaehler commented Dec 19, 2025

Uh oh!

mfakaehler commented Jan 14, 2026

Uh oh!

osma commented Jan 15, 2026

Uh oh!

osma commented Jan 15, 2026

Uh oh!

mfakaehler commented Jan 16, 2026

Uh oh!

mfakaehler commented Jan 16, 2026

Uh oh!

osma commented Jan 16, 2026

Uh oh!

mfakaehler commented Jan 16, 2026

Uh oh!

sonarqubecloud bot commented Jan 21, 2026

Quality Gate passed

Uh oh!

RietdorfC commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

codecov bot commented Nov 11, 2025 •

edited

Loading

juhoinkinen commented Nov 14, 2025 •

edited

Loading

juhoinkinen commented Nov 14, 2025 •

edited

Loading

osma commented Nov 14, 2025 •

edited

Loading

mfakaehler commented Dec 18, 2025 •

edited

Loading