Processor: replace loky with pebble to enforce worker timeouts #1345

bertsky · 2025-12-12T13:57:40Z

_page_worker: remove ThreadPool mechanism introduced in 3cc4780 (which broke processors that are not threadsafe like TF/Keras)
since no mechanisms work to stop computation in uniprocessing (as not even _thread.interrupt_main() or signal.alarm() would interrupt I/O or C library calls like libtesseract): drop
since neither stdlib's nor loky's ProcessPoolExecutor enforces timeouts on jobs: replace by pebble
apply max_seconds timeout iff in ProcessPool mode iff running with METS Server
make test_run_output_timeout xfail
add test_run_output_metsserver_timeout

see OCR-D/ocrd_anybaseocr#115 (comment) for context (plus internal discussion)

- `_page_worker`: remove `ThreadPool` mechanism introduced in 3cc4780 (which broke processors that are not threadsafe like TF/Keras) - since _no mechanisms_ work to stop computation in uniprocessing (as not even `_thread.interrupt_main()` or `signal.alarm()` would interrupt I/O or C library calls like libtesseract): drop - since neither stdlib's nor loky's ProcessPoolExecutor enforces timeouts on jobs: replace by pebble - apply `max_seconds` timeout iff in ProcessPool mode iff running with METS Server - make `test_run_output_timeout` xfail - add `test_run_output_metsserver_timeout`

kba

Looks reasonable.

bertsky · 2025-12-16T11:59:35Z

I wonder whether we should still keep some mechanism in the page worker, though – for those cases where our timeout mechanism does work even in uniprocessing. Like interrupting I/O wait or CPU-bound Pythonic computation with signal(), or with _thread.interrupt_main(). But then maybe in the ProcessPool case we would have to avoid these two racing against each other...

If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).

And, regardless, perhaps it would be better to have some actual test cases that cover the pathological case (simulating a long-lasting C library call like libtesseract).

EDIT: BTW, it's the same with KeyboardInterrupt: it works only if in a subprocess, but libtesseract calls are not (!) interruptible. (Perhaps we should take that to tesserocr, though...)

kba · 2025-12-16T14:01:32Z

Just quickly on this point:

If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).

What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.

bertsky · 2025-12-17T10:57:48Z

What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.

For simplicity and backwards-compatibility, we still want to support isolated runs of processor CLIs. It would be a shame if v3 envvars do not work there, too.

- `_page_worker`: reintroduce timeout, use cysignals.alarm to differentiate AlarmInterrupt from true KeyboardInterrupt - switch from ProcessPool.submit to ProcessPool.schedule to properly differentiate timeout kwarg from positional arg (so it can be consumed by pebble in multiprocessing and by our worker function in uniprocessing)

…r-timeout

bertsky · 2026-01-07T17:27:02Z

For simplicity and backwards-compatibility, we still want to support isolated runs of processor CLIs. It would be a shame if v3 envvars do not work there, too.

I managed to do that by reintroducing a SIGALRM based timeout for the uniprocessing case. (I did not go back to stdlib signal.alarm() though, because I would like to keep up the differentiation between TimeoutError and actual KeyboardInterrupt. Hence the use of cysignals.alarm.)

This does work for processors that are interruptible with ctrl+c / SIGINT. For ocrd_tesserocr, with sirfz/tesserocr#384 that is now the case.

I added an exception if timeouts are requested but the processor uses some kind of multithreading (which cannot work).

I also added a test for Tensorflow that tracks our problem that graphs cannot be reused across threads.

…NotImplementedError

bertsky · 2026-01-08T00:21:11Z

oh darn! Regarding CI, we are being riddled by dependency trouble:

3.8 does not work anymore on master (which I had to merge in order to avoid conflict)
3.12 does not receive older TF/Keras releases anymore (which we want to have for testing)

(see this branch for how these can be resolved)

kba approved these changes Dec 16, 2025

View reviewed changes

Robert Sachunsky added 3 commits January 7, 2026 17:49

test_processor: add tensorflow graph processor

2b1e5f2

Merge remote-tracking branch 'origin/master' into fix-processor-worke…

87578f5

…r-timeout

bertsky marked this pull request as ready for review January 7, 2026 17:19

bertsky mentioned this pull request Jan 7, 2026

crop: also support splitting double pages OCR-D/ocrd_anybaseocr#115

Open

Robert Sachunsky added 3 commits January 8, 2026 01:05

Processor: differentiate missing subclass implementations from other …

c15af76

…NotImplementedError

test_processor: use the proper API for TF2 version

962a311

test_processor: avoid Keras 3 (which dropped compat.v1)

f2dd1f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Processor: replace loky with pebble to enforce worker timeouts #1345

Processor: replace loky with pebble to enforce worker timeouts #1345

Uh oh!

bertsky commented Dec 12, 2025

Uh oh!

kba left a comment

Uh oh!

bertsky commented Dec 16, 2025 •

edited

Loading

Uh oh!

kba commented Dec 16, 2025

Uh oh!

bertsky commented Dec 17, 2025

Uh oh!

bertsky commented Jan 7, 2026

Uh oh!

bertsky commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Processor: replace loky with pebble to enforce worker timeouts #1345

Are you sure you want to change the base?

Processor: replace loky with pebble to enforce worker timeouts #1345

Uh oh!

Conversation

bertsky commented Dec 12, 2025

Uh oh!

kba left a comment

Choose a reason for hiding this comment

Uh oh!

bertsky commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kba commented Dec 16, 2025

Uh oh!

bertsky commented Dec 17, 2025

Uh oh!

bertsky commented Jan 7, 2026

Uh oh!

bertsky commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bertsky commented Dec 16, 2025 •

edited

Loading

bertsky commented Jan 8, 2026 •

edited

Loading