Page Trace AI

This repository hosts models used to predict tranformation steps needed to extract pages from scanned books and other printed media. It is split into src (main codebase for the ML models) and trace_app (a script which demoes the model using Page Trace API). We use 2 models to create page predictions:

YOCO model: You Only Crop Once

YOCO is a finetuned YOLO network based on YOLO11s (see: https://docs.ultralytics.com/models/yolo11/). It is used to predict the number of pages in a document together with its location.

RotateNET

RotateNET is a ResNET based model used to predict angle of each page.

Dataset creation

Input

The dataset is created based on ScanTailor metadata files. Your folders should follow this structure:

scan-id/
├─ rawdata/
│  ├─ 1/
│  │  └─ <*.tif images>
│  ├─ 2/
│  └─ ...
└─ scanTailor/
    ├─ 1.scanTailor
    ├─ 2.scanTailor
    └─ ...

Steps

Compress the input data
- Run:
```
src/create_dataset/compress_input_images.py
```
- The script compressed the input structure described above from tifs into jpgs, and saves in in a format used by other scripts.
Extract ScanTailor metadata
- Run:
```
src/create_dataset/extract_scantailor_data.py
```
- This script extracts crop coordinates and other metadata from the .scanTailor files and saves them as metadata.json files, stored in their respective folder.
- It also cleans mistakes in the training data and assigns objects to classes: page, back title cover, and unified doublepage.
Create dataset structure
- Run:
```
src/create_dataset/create_yolo_dataset.py
```
- It consumes the JSONs produced in step 1 and arranges files into the structure expected by Ultralytics YOLO (train / val / test). See: https://docs.ultralytics.com/datasets/detect/
- Images are padded by 10 % from left/right. This ensures rotation augmentation can be applied without getting page edges out ouf frame.
Assign classes and clean up
- Run:
```
src/create_dataset/assign_classes_and_cleanup.py
```
- This script cleans mistakes in the training data and assigns objects to classes: page, back title cover, and unified doublepage.
(Optional) Rebalance splits
- Run:
```
src/create_dataset/balance_train_val_test.py
```
- Use this to manually rebalance or refine the train/validation/test splits.

Output

You can use the output structure as an input for rotate and crop finetune nets.

Training

Scripts for model finetuning are stored in src.training.crop_train and src.training.rotate_train. Both models utilize the same dataset. Training reports are periodically saved to CometML, set your environment variable COMET_ML_API_KEY to enable this.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Page Trace AI

YOCO model: You Only Crop Once

RotateNET

Dataset creation

Input

Steps

Output

Training

About

Uh oh!

Releases

Packages

Languages

License

moravianlibrary/orezy-utils

Folders and files

Latest commit

History

Repository files navigation

Page Trace AI

YOCO model: You Only Crop Once

RotateNET

Dataset creation

Input

Steps

Output

Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages