HunyuanOCR Snipping Tool

A Mathpix-like snipping tool powered by HunyuanOCR-1B for extracting text, mathematical formulas, and tables from screenshots.

✨ Features

📸 Screen Region Capture: Select any region of your screen to OCR
📐 LaTeX Output: Automatic conversion of mathematical formulas to LaTeX
📊 Table Extraction: Extract tables in HTML format
📄 Document Parsing: Full document extraction in Markdown format
⌨️ Global Hotkeys: Quick access with customizable keyboard shortcuts
🎨 LaTeX Preview: Render and preview LaTeX formulas
💾 Multiple Export Options: Copy to clipboard or save to file
🚀 GPU Acceleration: CUDA support for fast processing
⚡ Optional vLLM Backend: Even faster inference with vLLM

🎯 Key Capabilities

Feature	Details
Model	HunyuanOCR-1B (~2GB)
Output Formats	LaTeX (math), HTML (tables), Markdown (docs)
Supported Languages	English, Chinese, Vietnamese, and more
Hotkeys	Ctrl+Shift+S/M/T/P for different modes

📋 Requirements

Python 3.11+
CUDA-capable GPU (recommended)
~4GB VRAM for the model
~2GB disk space for model weights

🚀 Installation

Quick Installation (Recommended)

# Clone the repository
git clone <repository-url>
cd math-snip-tool

# Run the installation script
chmod +x install.sh
./install.sh

The installation script will:

Create a conda environment (math-snip-tool)
Install PyTorch with CUDA support
Install the specific transformers branch required for HunyuanOCR
Install all dependencies
Download the HunyuanOCR model

Manual Installation

# 1. Create conda environment
conda create -n math-snip-tool python=3.11
conda activate math-snip-tool

# 2. Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# 3. Install specific transformers branch (REQUIRED!)
pip install git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4

# 4. Install other dependencies
pip install PyQt6 pillow mss pynput pyperclip accelerate matplotlib colorlog sentencepiece

# 5. Download model (will cache ~2GB)
python -c "from transformers import AutoProcessor; AutoProcessor.from_pretrained('tencent/HunyuanOCR', trust_remote_code=True)"

Optional: vLLM for Faster Inference

pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

🎮 Usage

Starting the Application

# Activate the environment
conda activate math-snip-tool

# Run the application
python src/main.py

# Or use the run script
chmod +x run.sh
./run.sh

The application will start in the system tray. Look for the icon in your system tray to access the menu.

Hotkeys

Shortcut	Action	Description
`Ctrl+Shift+S`	Capture Document	Full document with Markdown + LaTeX + HTML tables
`Ctrl+Shift+M`	Capture Math	Math formula extraction (LaTeX output)
`Ctrl+Shift+T`	Capture Table	Table extraction (HTML output)
`Ctrl+Shift+P`	Capture Plain Text	Plain text extraction only
`Esc`	Cancel	Cancel the current snipping operation

Using the Application

Trigger Capture: Press one of the hotkeys or click the system tray icon
Select Region: Click and drag to select the area you want to capture
Wait for Processing: The OCR will process the image (may take a few seconds)
View Results: A dialog will appear with the results
Copy or Save: Use the buttons to copy to clipboard or save to file

🏗️ Project Structure

math-snip-tool/
├── src/
│   ├── main.py                    # Entry point
│   ├── ui/
│   │   ├── app.py                 # Main application & system tray
│   │   ├── snipping_widget.py     # Screen capture overlay
│   │   ├── result_dialog.py       # OCR result display
│   │   ├── settings_dialog.py     # Settings configuration
│   │   └── latex_preview.py       # LaTeX rendering widget
│   ├── core/
│   │   ├── hotkey_manager.py      # Global hotkey handling
│   │   ├── screen_capture.py      # Screen capture utilities
│   │   └── clipboard.py           # Clipboard operations
│   ├── ocr/
│   │   ├── hunyuan_ocr.py         # HunyuanOCR with Transformers
│   │   ├── hunyuan_vllm.py        # HunyuanOCR with vLLM
│   │   └── prompts.py             # Prompt templates
│   └── utils/
│       ├── config.py              # Configuration management
│       └── text_cleaner.py        # Text processing utilities
├── resources/
│   ├── icons/                     # Application icons
│   └── styles/
│       └── dark_theme.qss         # Dark theme stylesheet
├── requirements.txt               # Python dependencies
├── install.sh                     # Installation script
├── run.sh                         # Quick start script
└── README.md                      # This file

⚙️ Configuration

Settings can be accessed via the system tray menu:

OCR Engine: Choose between transformers (default) or vllm (faster)
Device: Select cuda, cpu, or mps (for Mac)
Default Prompt: Choose the default OCR mode
Auto-copy: Automatically copy results to clipboard
Theme: Choose between dark, light, or system theme

🐛 Troubleshooting

Model Loading Issues

If you encounter issues loading the model:

# Make sure you have the correct transformers branch
pip install --force-reinstall git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4

# Verify the model can be loaded
python -c "from transformers import AutoProcessor; AutoProcessor.from_pretrained('tencent/HunyuanOCR', trust_remote_code=True)"

CUDA Out of Memory

If you get CUDA OOM errors:

Use a smaller image region
Switch to CPU mode in settings (slower but uses less memory)
Close other GPU-intensive applications

Hotkeys Not Working

On some Linux systems, you may need to run with elevated permissions:

sudo python src/main.py

Permission Issues (macOS)

On macOS, you may need to grant:

Screen Recording permission
Accessibility permission

Go to: System Preferences → Security & Privacy → Privacy

📝 License

See LICENSE file for details.

🙏 Acknowledgments

HunyuanOCR by Tencent for the OCR model
PyQt6 for the GUI framework
vLLM for fast inference

🔮 Future Enhancements

Custom hotkey configuration
Multiple language support in UI
Export to different formats (PDF, DOCX)
History of captured regions
OCR result editing
Batch processing of images
Cloud sync for captured results

📧 Contact

For issues and feature requests, please use the GitHub issue tracker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HunyuanOCR Snipping Tool

✨ Features

🎯 Key Capabilities

📋 Requirements

🚀 Installation

Quick Installation (Recommended)

Manual Installation

Optional: vLLM for Faster Inference

🎮 Usage

Starting the Application

Hotkeys

Using the Application

🏗️ Project Structure

⚙️ Configuration

🐛 Troubleshooting

Model Loading Issues

CUDA Out of Memory

Hotkeys Not Working

Permission Issues (macOS)

📝 License

🙏 Acknowledgments

🔮 Future Enhancements

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
resources/styles		resources/styles
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

HunyuanOCR Snipping Tool

✨ Features

🎯 Key Capabilities

📋 Requirements

🚀 Installation

Quick Installation (Recommended)

Manual Installation

Optional: vLLM for Faster Inference

🎮 Usage

Starting the Application

Hotkeys

Using the Application

🏗️ Project Structure

⚙️ Configuration

🐛 Troubleshooting

Model Loading Issues

CUDA Out of Memory

Hotkeys Not Working

Permission Issues (macOS)

📝 License

🙏 Acknowledgments

🔮 Future Enhancements

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages