Skip to content

[Feature]: Integrations with other backends via hOcr (naive implementation of easyOcr backend inside) #1250

@coffepowered

Description

@coffepowered

Describe the proposed feature

Hi, I see there are a few issues on the board proposing integrations of new backends.

I wondered how difficult this would be to do naively: it turns out that's doable, here's the result of a quick-and-dirty plugin I created in a couple of hours. I converted a nonreadable sample pdf using OCRmyPDF with easyOCR backend:

image

I basically created a hocr output from easyOCR result's object.However I am not sure if this is a suitable approach or has fundamental limitations that prevent this kind of integration from succeeding.

I expect any OCR to provide bounding boxes+text (at least) that can be then expressed in hOcr format.

Is there some profound or semantic limitations I am unaware of that make the reconstruction of hOcr format difficult?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions