The Fiscal Document Sorter is a Python-based tool for automatically classifying and organizing Brazilian fiscal documents from image files using Tesseract OCR. It is designed for triage and operational use, ideal for quickly sorting large numbers of files such as receipts, payment proofs, and invoices.
Documents are scanned for key fiscal terms and routed to appropriate folders, helping streamline manual organization tasks.
Note: Classification is keyword-based and may capture unrelated items (e.g., IDs). Manual verification is recommended.
- OCR text extraction using Tesseract (Portuguese language)
- Smart sorting:
- Detects terms like "comprovante", "recibo", "pagamento", "transferência" and sends them to the
receipts/folder - Other fiscal content is sent to the
others/folder
- Detects terms like "comprovante", "recibo", "pagamento", "transferência" and sends them to the
- Recognizes common Brazilian date formats (e.g., dd/mm/yyyy, dd/mm)
- Batch processing with parallel execution via ThreadPoolExecutor
- Real-time progress using tqdm
- Simple folder selection using a graphical interface (tkinter)
- Files are moved, not copied
- Python: 3.10+
- Libraries:
- opencv-python
- pillow
- pytesseract
- tqdm
Install all dependencies with:
pip install opencv-python pillow pytesseract tqdmAlso ensure Tesseract OCR is installed and in your system PATH: https://github.com/tesseract-ocr/tesseract
- Run the script:
python fiscal_sorter.py
- Select:
- Source folder (containing image files)
- Destination folder (where organized files will be saved)
- Files will be scanned and sorted automatically.
The script uses Tesseract OCR to extract text from each image and searches for fiscal-related keywords in Brazilian Portuguese. Based on the content, it moves the file to either:
receipts/: for payment-related documentsothers/: for general fiscal content
A progress bar displays the processing status. Files are handled in parallel for improved performance.
- .jpg
- .jpeg
- .png
- .bmp
- .tiff
destination_folder/
├── receipts/
│ ├── comprovante1.jpg
│ └── recibo_loja.png
└── others/
├── nota_fiscal_eletronica.jpg
└── invoice_loja.png
- Currently optimized for Brazilian fiscal document types
- May classify unrelated documents due to keyword-only logic
- More robust classification logic
- PDF support and integration with pdf_anomaly_detector
- Unified interface for broader document triage
MIT License – use freely with attribution.
MIT License – use freely with credit.