📰 Newspaper OCR Enhancement & Text-Layering Tool

This project enhances the readability and text extraction capability of scanned newspaper PDFs. It first applies color-preserving image preprocessing and then adds an invisible OCR-based text layer so that users can search and copy text from the resulting PDF without altering the original visual appearance.

📌 Features

✅ High-resolution image conversion (300 DPI)
✅ Color-preserving contrast enhancement & denoising
✅ Image reassembly into a PDF while maintaining full color
✅ Text-layer embedding using ocrmypdf
✅ Searchable and selectable final PDF output
⚠️ Note: You should download output pdf files to try text extraction

📅 Requirements

Python 3.x

Python packages:

ocrmypdf
pdf2image
opencv-python
numpy
Pillow

Poppler utilities:

macOS: brew install poppler
Ubuntu: sudo apt-get install poppler-utils

🔍 Main Function Explanations

convert_from_path() : Converts PDF pages to high-resolution color images
cv2.convertScaleAbs() : Enhances contrast and brightness
cv2.fastNlMeansDenoisingColored() : Denoises while preserving color
Image.fromarray() : Converts NumPy array back to PIL image for saving
ocrmypdf.ocr() : Adds an invisible, selectable OCR text layer to the PDF

🔄 How to Run

Place the original scanned newspaper PDF in the ./data folder.
Update the file name in the script (input_pdf variable).
Run the Python script:
```
python main.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Output		Output
data		data
Newspaper OCR Pre-Processing & OCRmyPDF Workflow.pdf		Newspaper OCR Pre-Processing & OCRmyPDF Workflow.pdf
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📰 Newspaper OCR Enhancement & Text-Layering Tool

📌 Features

📅 Requirements

🔍 Main Function Explanations

🔄 How to Run

About

Uh oh!

Releases

Packages

Languages

sowada23/CS4Good

Folders and files

Latest commit

History

Repository files navigation

📰 Newspaper OCR Enhancement & Text-Layering Tool

📌 Features

📅 Requirements

🔍 Main Function Explanations

🔄 How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages