Skip to content

[Bug]: ocrmypdf fails for a specific pdf file #1603

@dryBoneMarrow

Description

@dryBoneMarrow

Describe the bug

When running ocrmypdf on a specific pdf file, it raises an exception at 0% of the step Recompressing JPEGs.

An exception occurred while executing the pipeline
[Traceback, posted below ...]
OSError: image file is truncated (1 bytes not processed)

Steps to reproduce

1. Run `ocrmypdf input.pdf output.pdf`
2. Notice that it fails at Recompressing JPEGs stage

Files

The file is found here: https://annas-archive.org/md5/aee9796ac090fdc8a93fc654f32020f3

How did you download and install the software?

Linux package manager (apt, dnf, etc.)

OCRmyPDF version

16.12.0

Relevant log output

[...]
Optimizable images: JPEGs: 774 PNGs: 0                                                     optimize.py:371
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0%   0/774 -:--:--
An exception occurred while executing the pipeline                                          _common.py:296
Traceback (most recent call last):                                                                        
  File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipelines/_common.py", line 261, in                   
cli_exception_handler                                                                                     
    return fn(options, plugin_manager)                                                                    
  File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py", line 181, in                       
_run_pipeline                                                                                             
    optimize_messages = exec_concurrent(context, executor)                                                
  File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py", line 145, in                       
exec_concurrent                                                                                           
    pdf, messages = postprocess(pdf, context, executor)                                                   
                    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^                                                   
  File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipelines/_common.py", line 460, in                   
postprocess                                                                                               
    return optimize_pdf(pdf_out, context, executor)                                                       
  File "/usr/lib/python3.13/site-packages/ocrmypdf/_pipeline.py", line 992, in optimize_pdf               
    output_pdf, messages = context.plugin_manager.hook.optimize_pdf(                                      
                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^                                      
        input_pdf=input_file,                                                                             
        ^^^^^^^^^^^^^^^^^^^^^                                                                             
    ...<3 lines>...                                                                                       
        linearize=should_linearize(input_file, context),                                                  
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                  
    )                                                                                                     
    ^                                                                                                     
  File "/usr/lib/python3.13/site-packages/pluggy/_hooks.py", line 512, in __call__                        
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)                         
           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                         
  File "/usr/lib/python3.13/site-packages/pluggy/_manager.py", line 120, in _hookexec                     
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)                                  
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                  
  File "/usr/lib/python3.13/site-packages/pluggy/_callers.py", line 167, in _multicall                    
    raise exception                                                                                       
  File "/usr/lib/python3.13/site-packages/pluggy/_callers.py", line 121, in _multicall                    
    res = hook_impl.function(*args)                                                                       
  File "/usr/lib/python3.13/site-packages/ocrmypdf/builtin_plugins/optimize.py", line 145,                
in optimize_pdf                                                                                           
    result_path = optimize(input_pdf, output_pdf, context, save_settings, executor)                       
  File "/usr/lib/python3.13/site-packages/ocrmypdf/optimize.py", line 727, in optimize                    
    transcode_jpegs(pdf, jpegs, root, options, executor)                                                  
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                  
  File "/usr/lib/python3.13/site-packages/ocrmypdf/optimize.py", line 512, in                             
transcode_jpegs                                                                                           
    executor(                                                                                             
    ~~~~~~~~^                                                                                             
        use_threads=True,  # Processes are significantly slower at this task                              
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                              
    ...<9 lines>...                                                                                       
        task_finished=finish_jpeg,                                                                        
        ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                        
    )                                                                                                     
    ^                                                                                                     
  File "/usr/lib/python3.13/site-packages/ocrmypdf/_concurrent.py", line 78, in __call__                  
    self._execute(                                                                                        
    ~~~~~~~~~~~~~^                                                                                        
        use_threads=use_threads,                                                                          
        ^^^^^^^^^^^^^^^^^^^^^^^^                                                                          
    ...<5 lines>...                                                                                       
        task_finished=task_finished,                                                                      
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                      
    )                                                                                                     
    ^                                                                                                     
  File "/usr/lib/python3.13/site-packages/ocrmypdf/builtin_plugins/concurrency.py", line                  
162, in _execute                                                                                          
    result = future.result()                                                                              
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 449, in result                             
    return self.__get_result()                                                                            
           ~~~~~~~~~~~~~~~~~^^                                                                            
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result                       
    raise self._exception                                                                                 
  File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run                                
    result = self.fn(*self.args, **self.kwargs)                                                           
  File "/usr/lib/python3.13/site-packages/ocrmypdf/optimize.py", line 484, in                             
_optimize_jpeg                                                                                            
    im.save(opt_jpg, optimize=True, quality=jpeg_quality)                                                 
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                 
  File "/usr/lib/python3.13/site-packages/PIL/Image.py", line 2539, in save                               
    self.load()                                                                                           
    ~~~~~~~~~^^                                                                                           
  File "/usr/lib/python3.13/site-packages/PIL/ImageFile.py", line 391, in load                            
    raise OSError(msg)                                                                                    
OSError: image file is truncated (1 bytes not processed)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions