Describe the bug
We rely on adobe acrobat for CJK texts because of tesseract’s spacing issues (tesseract-ocr/tesseract#2702).
Currently, however, the pdf/a mode of ocrmypdf strips such an acrobat-produced pdf of its OCR layer.
Steps to reproduce
1. Run `ocrmypdf --skip-text input.pdf output.pdf`
2. `pdffonts input.pdf`
3. `pdffonts output.pdf`
The results of 2 and 3 are the same if `--output-type pdf` is used.
Files
No response
How did you download and install the software?
PyPI (pip, poetry, pipx, etc.)
OCRmyPDF version
16.10.1
Relevant log output