Skip to content

[Feature]: Removal of bad text layer while preserving vector graphics as-is #1608

@vsukhoml

Description

@vsukhoml

Describe the proposed feature

I have bunch of the old pdf files in my library which I want to make searchable and with OCR. Example is gehrke98algebraic.pdf.

The challenge is that while --force-ocr do work, it rasterize file, growing its size. --redo-ocr results in file with combination of old and new text layer, which not much usable either (though become searchable).

It would be nice to have a mode where original graphics, both raster and vector is preserved as is, while text layer is fully updated with a new version.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions