-
-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
Description
Summary
PDFKeeper’s current OCR workflow requires converting image‑based PDF pages to TIFF before processing. This extra conversion step increases processing time and is more resource‑intensive than modern AI‑based OCR engines that can operate directly on PDF files.
Proposed Solution
Adopt an AI‑based OCR engine that can:
- Process image‑based PDF pages directly without rasterization
- Support multiple languages
- Provide higher accuracy on low‑quality scans
- Offer a clean API suitable for integration into PDFKeeper’s existing architecture.
Benefits
- Faster OCR processing
- Higher accuracy, especially for complex or low‑quality documents
- Reduced CPU and memory usage
Reactions are currently unavailable