Skip to content

Conversation

@bertsky
Copy link
Collaborator

@bertsky bertsky commented Dec 12, 2025

  • _page_worker: remove ThreadPool mechanism introduced in 3cc4780 (which broke processors that are not threadsafe like TF/Keras)
  • since no mechanisms work to stop computation in uniprocessing (as not even _thread.interrupt_main() or signal.alarm() would interrupt I/O or C library calls like libtesseract): drop
  • since neither stdlib's nor loky's ProcessPoolExecutor enforces timeouts on jobs: replace by pebble
  • apply max_seconds timeout iff in ProcessPool mode iff running with METS Server
  • make test_run_output_timeout xfail
  • add test_run_output_metsserver_timeout

see OCR-D/ocrd_anybaseocr#115 (comment) for context (plus internal discussion)

- `_page_worker`: remove `ThreadPool` mechanism introduced in 3cc4780
  (which broke processors that are not threadsafe like TF/Keras)
- since _no mechanisms_ work to stop computation in uniprocessing
  (as not even `_thread.interrupt_main()` or `signal.alarm()` would
   interrupt I/O or C library calls like libtesseract): drop
- since neither stdlib's nor loky's ProcessPoolExecutor enforces
  timeouts on jobs: replace by pebble
- apply `max_seconds` timeout iff in ProcessPool mode iff running with
  METS Server
- make `test_run_output_timeout` xfail
- add `test_run_output_metsserver_timeout`
Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 16, 2025

I wonder whether we should still keep some mechanism in the page worker, though – for those cases where our timeout mechanism does work even in uniprocessing. Like interrupting I/O wait or CPU-bound Pythonic computation with signal(), or with _thread.interrupt_main(). But then maybe in the ProcessPool case we would have to avoid these two racing against each other...

If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).

And, regardless, perhaps it would be better to have some actual test cases that cover the pathological case (simulating a long-lasting C library call like libtesseract).

EDIT: BTW, it's the same with KeyboardInterrupt: it works only if in a subprocess, but libtesseract calls are not (!) interruptible. (Perhaps we should take that to tesserocr, though...)

@kba
Copy link
Member

kba commented Dec 16, 2025

Just quickly on this point:

If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).

What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 17, 2025

What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.

For simplicity and backwards-compatibility, we still want to support isolated runs of processor CLIs. It would be a shame if v3 envvars do not work there, too.

Robert Sachunsky added 3 commits January 7, 2026 17:49
- `_page_worker`: reintroduce timeout, use cysignals.alarm
  to differentiate AlarmInterrupt from true KeyboardInterrupt
- switch from ProcessPool.submit to ProcessPool.schedule to
  properly differentiate timeout kwarg from positional arg
  (so it can be consumed by pebble in multiprocessing and by
   our worker function in uniprocessing)
@bertsky bertsky marked this pull request as ready for review January 7, 2026 17:19
@bertsky
Copy link
Collaborator Author

bertsky commented Jan 7, 2026

For simplicity and backwards-compatibility, we still want to support isolated runs of processor CLIs. It would be a shame if v3 envvars do not work there, too.

I managed to do that by reintroducing a SIGALRM based timeout for the uniprocessing case. (I did not go back to stdlib signal.alarm() though, because I would like to keep up the differentiation between TimeoutError and actual KeyboardInterrupt. Hence the use of cysignals.alarm.)

This does work for processors that are interruptible with ctrl+c / SIGINT. For ocrd_tesserocr, with sirfz/tesserocr#384 that is now the case.

I added an exception if timeouts are requested but the processor uses some kind of multithreading (which cannot work).

I also added a test for Tensorflow that tracks our problem that graphs cannot be reused across threads.

@bertsky
Copy link
Collaborator Author

bertsky commented Jan 8, 2026

oh darn! Regarding CI, we are being riddled by dependency trouble:

  • 3.8 does not work anymore on master (which I had to merge in order to avoid conflict)
  • 3.12 does not receive older TF/Keras releases anymore (which we want to have for testing)

(see this branch for how these can be resolved)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants