Some files produce SubprocessError ... #1572

drnicolas · 2025-09-16T16:14:08Z

drnicolas
Sep 16, 2025

This is how I call ocrmypdf in my python-script:
ocrmypdf.configure_logging(verbosity=2,progress_bar_friendly=False,) res=ocrmypdf.ocr(filename, output_file=tmpfile+'.pdf',language='deu+eng',title='OCR FAX von '+xxx, \ sidecar=tmpfile+'.txt', deskew=True,output_type='pdf' , clean=True, rotate_pages=False, author='autolink', subject='automatische Befundverlinkung'

This produces a lot of output.
OCR ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 50% 2/4 0:00:04 DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 290, 2255) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(0.999982, -0.00599989, 0.00599989, 0.999982, 289, 2303) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 288, 2349) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(0.999982, -0.00599989, 0.00599989, 0.999982, 289, 2445) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(0.999982, -0.00599989, 0.00599989, 0.999982, 289, 2493) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 289, 2683) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 290, 2871) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 289, 3313) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 288, 3354) DEBUG ocrmypdf._pipeline - 4 Rasterize with pngmono, rotation 0 DEBUG ocrmypdf._graft - 3 Emplacement update DEBUG ocrmypdf._graft - 3 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 DEBUG ocrmypdf._graft - 3 Grafting DEBUG ocrmypdf._graft - 3 Grafting with ctm pikepdf.Matrix(0.9998, 0, 0, 1, 0, 0) DEBUG ocrmypdf._graft - 3 Page rotation: (content, auto) -> page = (0, 0) -> 0 DEBUG ocrmypdf.subprocess - 4 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pngmono', '-dFirstPage=4', '-dLastPage=4', '-r300.000000x300.000000', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io.bu66gd9y/origin.pdf'] OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 75% 3/4 0:00:05 DEBUG ocrmypdf.subprocess - 4 Running: ['tesseract', '-l', 'deu+eng', '--psm', '2', '/tmp/ocrmypdf.io.bu66gd9y/000004_rasterize.png', 'stdout'] OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 75% 3/4 0:00:05

Anyway, it ends up with a SubprocessError ...

Doing ocrmypdf from the commandline to the same file, seems to run fine; the output is generated.
But there are complaints about missing decoders (JBIG2 is installed)

osixPath('/tmp/ocrmypdf.io.hjou8ebz/images/00000031.tif')] JBIG2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 4/4 0:00:00 DEBUG ocrmypdf.helpers - helpers.py:179 os.symlink(/tmp/ocrmypdf.io.hjou8ebz/optimize.opt.pdf, /tmp/ocrmypdf.io.hjou8ebz/optimize.pdf) DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version'] __init__.py:133 DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version'] __init__.py:133 INFO ocrmypdf._pipeline - Image optimization ratio: 1.24 _pipeline.py:989 savings: 19.4% INFO ocrmypdf._pipeline - Total file size ratio: 2.62 _pipeline.py:992 savings: 61.8% DEBUG ocrmypdf._pipeline - _pipeline.py:1064 /tmp/ocrmypdf.io.hjou8ebz/optimize.pdf -> /tmp/zzz.pdf INFO ocrmypdf._pipelines._common - Output file is a PDF/A-2B _common.py:441 (as expected) WARNING py.warnings - warnings.py:109 /usr/local/lib/python3.11/dist-packages/pikepdf/_methods.py:264: UserWarning: pikepdf is missing some specialized decoders (probably JBIG2) so not all stream contents can be tested. self._decode_all_streams_and_discard()

What can I do? ocrmypdf is version 14.something

jbarlow83 · 2025-09-16T17:23:28Z

jbarlow83
Sep 16, 2025
Maintainer

Most likely you have the JBIG2 encoder installed (jbig2 --version) but not the decoder (jbig2dec --version).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some files produce SubprocessError ... #1572

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Some files produce SubprocessError ... #1572

Uh oh!

drnicolas Sep 16, 2025

Replies: 1 comment

Uh oh!

jbarlow83 Sep 16, 2025 Maintainer

drnicolas
Sep 16, 2025

jbarlow83
Sep 16, 2025
Maintainer