Skip to content
Discussion options

You must be logged in to vote

Yes, there are APIs in ocrmypdf._api that provide a "pdf to hocr" and "hocr to ocr pdf" in separate steps. An open source PDF app asked for this feature so they could use it to implement intermediate hocr editing. I don't know if that every got implemented, which is why it remains a private API, since I wanted to make sure the intended consumer could use it in its current form. For the same reason, there's no command line interface; you have to script it. So if you obtain hocr from somewhere, you can get it rendered and applied.

There are plugins like OCRmyPDF-PaddleOCR if you just want to use a plugin.

OCRmyPDF is pretty tightly coupled to Tesseract even if you bypass it for OCR - it won…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@bluebox-steven
Comment options

Answer selected by bluebox-steven
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants