-
Notifications
You must be signed in to change notification settings - Fork 33
Processor: replace loky with pebble to enforce worker timeouts #1345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- `_page_worker`: remove `ThreadPool` mechanism introduced in 3cc4780 (which broke processors that are not threadsafe like TF/Keras) - since _no mechanisms_ work to stop computation in uniprocessing (as not even `_thread.interrupt_main()` or `signal.alarm()` would interrupt I/O or C library calls like libtesseract): drop - since neither stdlib's nor loky's ProcessPoolExecutor enforces timeouts on jobs: replace by pebble - apply `max_seconds` timeout iff in ProcessPool mode iff running with METS Server - make `test_run_output_timeout` xfail - add `test_run_output_metsserver_timeout`
kba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable.
|
I wonder whether we should still keep some mechanism in the page worker, though – for those cases where our timeout mechanism does work even in uniprocessing. Like interrupting I/O wait or CPU-bound Pythonic computation with If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server). And, regardless, perhaps it would be better to have some actual test cases that cover the pathological case (simulating a long-lasting C library call like libtesseract). EDIT: BTW, it's the same with |
|
Just quickly on this point:
What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server. |
For simplicity and backwards-compatibility, we still want to support isolated runs of processor CLIs. It would be a shame if v3 envvars do not work there, too. |
- `_page_worker`: reintroduce timeout, use cysignals.alarm to differentiate AlarmInterrupt from true KeyboardInterrupt - switch from ProcessPool.submit to ProcessPool.schedule to properly differentiate timeout kwarg from positional arg (so it can be consumed by pebble in multiprocessing and by our worker function in uniprocessing)
I managed to do that by reintroducing a This does work for processors that are interruptible with ctrl+c / SIGINT. For ocrd_tesserocr, with sirfz/tesserocr#384 that is now the case. I added an exception if timeouts are requested but the processor uses some kind of multithreading (which cannot work). I also added a test for Tensorflow that tracks our problem that graphs cannot be reused across threads. |
|
oh darn! Regarding CI, we are being riddled by dependency trouble:
(see this branch for how these can be resolved) |
_page_worker: removeThreadPoolmechanism introduced in 3cc4780 (which broke processors that are not threadsafe like TF/Keras)_thread.interrupt_main()orsignal.alarm()would interrupt I/O or C library calls like libtesseract): dropmax_secondstimeout iff in ProcessPool mode iff running with METS Servertest_run_output_timeoutxfailtest_run_output_metsserver_timeoutsee OCR-D/ocrd_anybaseocr#115 (comment) for context (plus internal discussion)