Scene Text Recognition

Overview

Scene Text Recognition is a robust two-stage pipeline designed for detecting and reading text from scene images. The system first leverages the yolov11 model to accurately detect text regions and subsequently applies the CRNN model to recognize the text. This approach effectively addresses challenges posed by diverse fonts, complex backgrounds, and various text orientations.

Download the dataset here.

Pipeline

The overall processing pipeline is illustrated below:

Results

Below are several examples demonstrating the performance of the system:

Deploy on Hugging Face

Try the live demo on Hugging Face Spaces: Scene Text Recognition

How to Run

Navigate to the project directory:
```
cd path/to/Scene_Text_Recognization
```
Install requirements.txt
```
pip install -r src/requirements.txt
```

Run the prediction script:

python src/predict.py --image_path=path/to/your/image --save_path=path/to/saved/directory

Reporoduce

Dataset Structure

The dataset is organized as follows:

Dataset
├── apanar_06.08.2002
│   └── image.jpg
│   └── ...
│── lfsosa_12.08.2002
│   └── image.jpg
│   └── ...
├── ryoungt_03.09.2002
│   └── image.jpg
│   └── ...
├── ryoungt_05.08.2002
│   └── image.jpg
│   └── ...
├── locations.xml
├── segmentation.xml
├── words.xml

Preparing the Datasets

For YOLO dataset

Run the following command to prepare the dataset for text localization with YOLO:
```
python src/Text_Localization/prepare_dataset.py
```

After execution, a new YOLO dataset will be created in the Dataset/yolo_data folder with the following structure:

  Dataset
  ├── ...
  ├── yolo_data
  │   └── test
  │       └── images
  │       └── labels
  │   └── train
  │       └── images
  │       └── labels
  │   └── val
  │       └── images
  │       └── labels
  │   └── data.yaml
  ├── ...

For CRNN dataset

Run the following command to prepare the dataset for text recognition with CRNN:
```
python src/Text_Recognization/prepare_dataset.py
```

After execution, a new CRNN dataset will be created in the Dataset/ocr_dataset folder with the structure:

  Dataset
  ├── ...
  ├── ocr_dataset
  │   └── image1.jpg
  │   └── image2.jpg
  │   └── ...
  │   └── labels.txt
  ├── ...

Training model text localization

Execute the following command to train the text localization model:

python src\Text_Localization\text_localization.py

Training model text recognization

Train the CRNN model using the following command:

python src/Text_Localization/trainer.py --root_path=path/to/root/directory --save_path--path/to/save/weight

Checkpoints

After training both models, ensure that the checkpoint files are set up as follows:

SCENE_TEXT_RECOGNIZATION
├── checkpoints
│   └── yolov11.pt
│   └── ocr_extend_vocab_size.pt
├── ...

Or you can download the checkpoints here

Inference

To run inference on an input image, execute:

python src/predict.py --image_path=path/to/your/image --save_path=path/to/saved/directory

Deploy on Gradio

Deploy your model via Gradio to share it with others:

python src/deploy.py

This command will launch an interactive Gradio interface for the Scene Text Recognition system.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
images		images
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scene Text Recognition

Overview

Pipeline

Results

Deploy on Hugging Face

How to Run

Reporoduce

Dataset Structure

Preparing the Datasets

Training model text localization

Training model text recognization

Checkpoints

Inference

Deploy on Gradio

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scene Text Recognition

Overview

Pipeline

Results

Deploy on Hugging Face

How to Run

Reporoduce

Dataset Structure

Preparing the Datasets

Training model text localization

Training model text recognization

Checkpoints

Inference

Deploy on Gradio

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages