Skip to content

[Bug]: Crash on JBIG2 compression #1607

@vsukhoml

Description

@vsukhoml

Describe the bug

I was running:

ocrmypdf --force-ocr gehrke98algebraic.pdf gehrke98algebraic_clean.pdf

and got with both Ubuntu's 24.04 version and Github's source :

An exception occurred while executing the pipeline                                                                                                       _sync.py:473
Traceback (most recent call last):                                                                                                                                   
  File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 409, in run_pipeline                                                                                 
    optimize_messages = exec_concurrent(context, executor)                                                                                                           
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                           
  File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 315, in exec_concurrent                                                                              
    pdf, messages = post_process(pdf, context, executor)                                                                                                             
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                             
  File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 247, in post_process                                                                                 
    return optimize_pdf(pdf_out, context, executor)                                                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                  
  File "/usr/lib/python3/dist-packages/ocrmypdf/_pipeline.py", line 1009, in optimize_pdf                                                                            
    output_pdf, messages = context.plugin_manager.hook.optimize_pdf(                                                                                                 
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                 
  File "/usr/lib/python3/dist-packages/pluggy/_hooks.py", line 501, in __call__                                                                                      
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)                                                                                    
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                    
  File "/usr/lib/python3/dist-packages/pluggy/_manager.py", line 119, in _hookexec                                                                                   
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)                                                                                             
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                             
  File "/usr/lib/python3/dist-packages/pluggy/_callers.py", line 138, in _multicall                                                                                  
    raise exception.with_traceback(exception.__traceback__)                                                                                                          
  File "/usr/lib/python3/dist-packages/pluggy/_callers.py", line 102, in _multicall                                                                                  
    res = hook_impl.function(*args)                                                                                                                                  
          ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                  
  File "/usr/lib/python3/dist-packages/ocrmypdf/builtin_plugins/optimize.py", line 145, in optimize_pdf                                                              
    result_path = optimize(input_pdf, output_pdf, context, save_settings, executor)                                                                                  
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                  
  File "/usr/lib/python3/dist-packages/ocrmypdf/optimize.py", line 695, in optimize                                                                                  
    convert_to_jbig2(pdf, jbig2_groups, root, options, executor)                                                                                                     
  File "/usr/lib/python3/dist-packages/ocrmypdf/optimize.py", line 429, in convert_to_jbig2                                                                          
    _produce_jbig2_images(jbig2_groups, root, options, executor)                                                                                                     
  File "/usr/lib/python3/dist-packages/ocrmypdf/optimize.py", line 394, in _produce_jbig2_images                                                                     
    executor(                                                                                                                                                        
  File "/usr/lib/python3/dist-packages/ocrmypdf/_concurrent.py", line 86, in __call__                                                                                
    self._execute(                                                                                                                                                   
  File "/usr/lib/python3/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 138, in _execute                                                               
    result = future.result()                                                                                                                                         
             ^^^^^^^^^^^^^^^                                                                                                                                         
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result                                                                                        
    return self.__get_result()                                                                                                                                       
           ^^^^^^^^^^^^^^^^^^^                                                                                                                                       
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result                                                                                  
    raise self._exception                                                                                                                                            
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run                                                                                           
    result = self.fn(*self.args, **self.kwargs)                                                                                                                      
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                      
  File "/usr/lib/python3/dist-packages/ocrmypdf/_exec/jbig2enc.py", line 61, in convert_single_mp                                                                    
    return convert_single(                                                                                                                                           
           ^^^^^^^^^^^^^^^                                                                                                                                           
  File "/usr/lib/python3/dist-packages/ocrmypdf/_exec/jbig2enc.py", line 56, in convert_single                                                                       
    proc.check_returncode()                                                                                                                                          
  File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode                                                                                            
    raise CalledProcessError(self.returncode, self.args, self.stdout,                                                                                                
subprocess.CalledProcessError: Command '['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.ad8hklkw/images/00000083.tif')]' returned non-zero              
exit status 1. 

It works well when adding -O0, but produces much larger file.

Steps to reproduce

1. Run ocrmypdf -v1 --force-ocr gehrke98algebraic.pdf gehrke98algebraic_clean.pdf
2. Observe crash

Files

gehrke98algebraic.pdf

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

15.2.0+dfsg1

Relevant log output

xref 79: treating as an optimization candidate                                                                                                        optimize.py:274
xref 81: treating as an optimization candidate                                                                                                        optimize.py:274
xref 83: treating as an optimization candidate                                                                                                        optimize.py:274
xref 85: treating as an optimization candidate                                                                                                        optimize.py:274
xref 87: treating as an optimization candidate                                                                                                        optimize.py:274
xref 89: treating as an optimization candidate                                                                                                        optimize.py:274
xref 91: treating as an optimization candidate                                                                                                        optimize.py:274
xref 93: treating as an optimization candidate                                                                                                        optimize.py:274
xref 95: treating as an optimization candidate                                                                                                        optimize.py:274
xref 97: treating as an optimization candidate                                                                                                        optimize.py:274
xref 99: treating as an optimization candidate                                                                                                        optimize.py:274
xref 101: treating as an optimization candidate                                                                                                       optimize.py:274
xref 103: treating as an optimization candidate                                                                                                       optimize.py:274
xref 105: treating as an optimization candidate                                                                                                       optimize.py:274
xref 107: treating as an optimization candidate                                                                                                       optimize.py:274
xref 109: treating as an optimization candidate                                                                                                       optimize.py:274
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Running: ['jbig2', '--version']                                                                                                                       __init__.py:134
Optimizable images: JBIG2 groups: 14                                                                                                                  optimize.py:355
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000079.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000081.tif')]                                                 __init__.py:134
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000083.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000091.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000105.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000099.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000097.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000089.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000087.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000103.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000093.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000107.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000109.tif')]                                                 __init__.py:134
Running: ['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000095.tif')]                                                 __init__.py:134
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
stderr = Error in findTiffCompression: function not present                                                                                            __init__.py:76
Error in tiffGetCount: function not present                                                                                                                          
                                                                                                                                                                     
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0%  0/14 -:--:--
An exception occurred while executing the pipeline                                                                                                       _sync.py:473
Traceback (most recent call last):                                                                                                                                   
  File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 409, in run_pipeline                                                                                 
    optimize_messages = exec_concurrent(context, executor)                                                                                                           
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                           
  File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 315, in exec_concurrent                                                                              
    pdf, messages = post_process(pdf, context, executor)                                                                                                             
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                             
  File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 247, in post_process                                                                                 
    return optimize_pdf(pdf_out, context, executor)                                                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                  
  File "/usr/lib/python3/dist-packages/ocrmypdf/_pipeline.py", line 1009, in optimize_pdf                                                                            
    output_pdf, messages = context.plugin_manager.hook.optimize_pdf(                                                                                                 
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                 
  File "/usr/lib/python3/dist-packages/pluggy/_hooks.py", line 501, in __call__                                                                                      
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)                                                                                    
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                    
  File "/usr/lib/python3/dist-packages/pluggy/_manager.py", line 119, in _hookexec                                                                                   
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)                                                                                             
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                             
  File "/usr/lib/python3/dist-packages/pluggy/_callers.py", line 138, in _multicall                                                                                  
    raise exception.with_traceback(exception.__traceback__)                                                                                                          
  File "/usr/lib/python3/dist-packages/pluggy/_callers.py", line 102, in _multicall                                                                                  
    res = hook_impl.function(*args)                                                                                                                                  
          ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                  
  File "/usr/lib/python3/dist-packages/ocrmypdf/builtin_plugins/optimize.py", line 145, in optimize_pdf                                                              
    result_path = optimize(input_pdf, output_pdf, context, save_settings, executor)                                                                                  
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                  
  File "/usr/lib/python3/dist-packages/ocrmypdf/optimize.py", line 695, in optimize                                                                                  
    convert_to_jbig2(pdf, jbig2_groups, root, options, executor)                                                                                                     
  File "/usr/lib/python3/dist-packages/ocrmypdf/optimize.py", line 429, in convert_to_jbig2                                                                          
    _produce_jbig2_images(jbig2_groups, root, options, executor)                                                                                                     
  File "/usr/lib/python3/dist-packages/ocrmypdf/optimize.py", line 394, in _produce_jbig2_images                                                                     
    executor(                                                                                                                                                        
  File "/usr/lib/python3/dist-packages/ocrmypdf/_concurrent.py", line 86, in __call__                                                                                
    self._execute(                                                                                                                                                   
  File "/usr/lib/python3/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 138, in _execute                                                               
    result = future.result()                                                                                                                                         
             ^^^^^^^^^^^^^^^                                                                                                                                         
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result                                                                                        
    return self.__get_result()                                                                                                                                       
           ^^^^^^^^^^^^^^^^^^^                                                                                                                                       
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result                                                                                  
    raise self._exception                                                                                                                                            
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run                                                                                           
    result = self.fn(*self.args, **self.kwargs)                                                                                                                      
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                      
  File "/usr/lib/python3/dist-packages/ocrmypdf/_exec/jbig2enc.py", line 61, in convert_single_mp                                                                    
    return convert_single(                                                                                                                                           
           ^^^^^^^^^^^^^^^                                                                                                                                           
  File "/usr/lib/python3/dist-packages/ocrmypdf/_exec/jbig2enc.py", line 56, in convert_single                                                                       
    proc.check_returncode()                                                                                                                                          
  File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode                                                                                            
    raise CalledProcessError(self.returncode, self.args, self.stdout,                                                                                                
subprocess.CalledProcessError: Command '['jbig2', '--pdf', '-t', '0.85', PosixPath('/tmp/ocrmypdf.io.cxyi2r43/images/00000079.tif')]' returned non-zero              
exit status 1.

Metadata

Metadata

Assignees

Labels

triageIssue needs triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions