Scene Text Recognition is a robust two-stage pipeline designed for detecting and reading text from scene images. The system first leverages the yolov11 model to accurately detect text regions and subsequently applies the CRNN model to recognize the text. This approach effectively addresses challenges posed by diverse fonts, complex backgrounds, and various text orientations.
Download the dataset here.
The overall processing pipeline is illustrated below:
Below are several examples demonstrating the performance of the system:
Try the live demo on Hugging Face Spaces: Scene Text Recognition
-
Navigate to the project directory:
cd path/to/Scene_Text_Recognization -
Install requirements.txt
pip install -r src/requirements.txt
-
Run the prediction script:
python src/predict.py --image_path=path/to/your/image --save_path=path/to/saved/directory
The dataset is organized as follows:
Dataset
├── apanar_06.08.2002
│ └── image.jpg
│ └── ...
│── lfsosa_12.08.2002
│ └── image.jpg
│ └── ...
├── ryoungt_03.09.2002
│ └── image.jpg
│ └── ...
├── ryoungt_05.08.2002
│ └── image.jpg
│ └── ...
├── locations.xml
├── segmentation.xml
├── words.xml
-
For YOLO dataset
Run the following command to prepare the dataset for text localization with YOLO:
python src/Text_Localization/prepare_dataset.py
-
After execution, a new YOLO dataset will be created in the
Dataset/yolo_datafolder with the following structure:Dataset ├── ... ├── yolo_data │ └── test │ └── images │ └── labels │ └── train │ └── images │ └── labels │ └── val │ └── images │ └── labels │ └── data.yaml ├── ... -
For CRNN dataset
Run the following command to prepare the dataset for text recognition with CRNN:
python src/Text_Recognization/prepare_dataset.py
-
After execution, a new CRNN dataset will be created in the
Dataset/ocr_datasetfolder with the structure:Dataset ├── ... ├── ocr_dataset │ └── image1.jpg │ └── image2.jpg │ └── ... │ └── labels.txt ├── ...
Execute the following command to train the text localization model:
python src\Text_Localization\text_localization.pyTrain the CRNN model using the following command:
python src/Text_Localization/trainer.py --root_path=path/to/root/directory --save_path--path/to/save/weightAfter training both models, ensure that the checkpoint files are set up as follows:
SCENE_TEXT_RECOGNIZATION
├── checkpoints
│ └── yolov11.pt
│ └── ocr_extend_vocab_size.pt
├── ...
Or you can download the checkpoints here
To run inference on an input image, execute:
python src/predict.py --image_path=path/to/your/image --save_path=path/to/saved/directory
Deploy your model via Gradio to share it with others:
python src/deploy.py
This command will launch an interactive Gradio interface for the Scene Text Recognition system.








