Replies: 1 comment
-
|
Most likely you have the JBIG2 encoder installed (jbig2 --version) but not the decoder (jbig2dec --version). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This is how I call ocrmypdf in my python-script:
ocrmypdf.configure_logging(verbosity=2,progress_bar_friendly=False,) res=ocrmypdf.ocr(filename, output_file=tmpfile+'.pdf',language='deu+eng',title='OCR FAX von '+xxx, \ sidecar=tmpfile+'.txt', deskew=True,output_type='pdf' , clean=True, rotate_pages=False, author='autolink', subject='automatische Befundverlinkung'This produces a lot of output.
OCR ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 50% 2/4 0:00:04 DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 290, 2255) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(0.999982, -0.00599989, 0.00599989, 0.999982, 289, 2303) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 288, 2349) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(0.999982, -0.00599989, 0.00599989, 0.999982, 289, 2445) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(0.999982, -0.00599989, 0.00599989, 0.999982, 289, 2493) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 289, 2683) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 290, 2871) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 289, 3313) DEBUG ocrmypdf.hocrtransform._hocr - 3 deu DEBUG ocrmypdf.hocrtransform._hocr - 3 pikepdf.Matrix(1, 0, 0, 1, 288, 3354) DEBUG ocrmypdf._pipeline - 4 Rasterize with pngmono, rotation 0 DEBUG ocrmypdf._graft - 3 Emplacement update DEBUG ocrmypdf._graft - 3 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 DEBUG ocrmypdf._graft - 3 Grafting DEBUG ocrmypdf._graft - 3 Grafting with ctm pikepdf.Matrix(0.9998, 0, 0, 1, 0, 0) DEBUG ocrmypdf._graft - 3 Page rotation: (content, auto) -> page = (0, 0) -> 0 DEBUG ocrmypdf.subprocess - 4 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=pngmono', '-dFirstPage=4', '-dLastPage=4', '-r300.000000x300.000000', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io.bu66gd9y/origin.pdf'] OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 75% 3/4 0:00:05 DEBUG ocrmypdf.subprocess - 4 Running: ['tesseract', '-l', 'deu+eng', '--psm', '2', '/tmp/ocrmypdf.io.bu66gd9y/000004_rasterize.png', 'stdout'] OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 75% 3/4 0:00:05Anyway, it ends up with a SubprocessError ...
Doing ocrmypdf from the commandline to the same file, seems to run fine; the output is generated.
But there are complaints about missing decoders (JBIG2 is installed)
osixPath('/tmp/ocrmypdf.io.hjou8ebz/images/00000031.tif')] JBIG2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 4/4 0:00:00 DEBUG ocrmypdf.helpers - helpers.py:179 os.symlink(/tmp/ocrmypdf.io.hjou8ebz/optimize.opt.pdf, /tmp/ocrmypdf.io.hjou8ebz/optimize.pdf) DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version'] __init__.py:133 DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version'] __init__.py:133 INFO ocrmypdf._pipeline - Image optimization ratio: 1.24 _pipeline.py:989 savings: 19.4% INFO ocrmypdf._pipeline - Total file size ratio: 2.62 _pipeline.py:992 savings: 61.8% DEBUG ocrmypdf._pipeline - _pipeline.py:1064 /tmp/ocrmypdf.io.hjou8ebz/optimize.pdf -> /tmp/zzz.pdf INFO ocrmypdf._pipelines._common - Output file is a PDF/A-2B _common.py:441 (as expected) WARNING py.warnings - warnings.py:109 /usr/local/lib/python3.11/dist-packages/pikepdf/_methods.py:264: UserWarning: pikepdf is missing some specialized decoders (probably JBIG2) so not all stream contents can be tested. self._decode_all_streams_and_discard()What can I do? ocrmypdf is version 14.something
Beta Was this translation helpful? Give feedback.
All reactions