A Mathpix-like snipping tool powered by HunyuanOCR-1B for extracting text, mathematical formulas, and tables from screenshots.
- 📸 Screen Region Capture: Select any region of your screen to OCR
- 📐 LaTeX Output: Automatic conversion of mathematical formulas to LaTeX
- 📊 Table Extraction: Extract tables in HTML format
- 📄 Document Parsing: Full document extraction in Markdown format
- ⌨️ Global Hotkeys: Quick access with customizable keyboard shortcuts
- 🎨 LaTeX Preview: Render and preview LaTeX formulas
- 💾 Multiple Export Options: Copy to clipboard or save to file
- 🚀 GPU Acceleration: CUDA support for fast processing
- ⚡ Optional vLLM Backend: Even faster inference with vLLM
| Feature | Details |
|---|---|
| Model | HunyuanOCR-1B (~2GB) |
| Output Formats | LaTeX (math), HTML (tables), Markdown (docs) |
| Supported Languages | English, Chinese, Vietnamese, and more |
| Hotkeys | Ctrl+Shift+S/M/T/P for different modes |
- Python 3.11+
- CUDA-capable GPU (recommended)
- ~4GB VRAM for the model
- ~2GB disk space for model weights
# Clone the repository
git clone <repository-url>
cd math-snip-tool
# Run the installation script
chmod +x install.sh
./install.shThe installation script will:
- Create a conda environment (
math-snip-tool) - Install PyTorch with CUDA support
- Install the specific transformers branch required for HunyuanOCR
- Install all dependencies
- Download the HunyuanOCR model
# 1. Create conda environment
conda create -n math-snip-tool python=3.11
conda activate math-snip-tool
# 2. Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# 3. Install specific transformers branch (REQUIRED!)
pip install git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4
# 4. Install other dependencies
pip install PyQt6 pillow mss pynput pyperclip accelerate matplotlib colorlog sentencepiece
# 5. Download model (will cache ~2GB)
python -c "from transformers import AutoProcessor; AutoProcessor.from_pretrained('tencent/HunyuanOCR', trust_remote_code=True)"pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly# Activate the environment
conda activate math-snip-tool
# Run the application
python src/main.py
# Or use the run script
chmod +x run.sh
./run.shThe application will start in the system tray. Look for the icon in your system tray to access the menu.
| Shortcut | Action | Description |
|---|---|---|
Ctrl+Shift+S |
Capture Document | Full document with Markdown + LaTeX + HTML tables |
Ctrl+Shift+M |
Capture Math | Math formula extraction (LaTeX output) |
Ctrl+Shift+T |
Capture Table | Table extraction (HTML output) |
Ctrl+Shift+P |
Capture Plain Text | Plain text extraction only |
Esc |
Cancel | Cancel the current snipping operation |
- Trigger Capture: Press one of the hotkeys or click the system tray icon
- Select Region: Click and drag to select the area you want to capture
- Wait for Processing: The OCR will process the image (may take a few seconds)
- View Results: A dialog will appear with the results
- Copy or Save: Use the buttons to copy to clipboard or save to file
math-snip-tool/
├── src/
│ ├── main.py # Entry point
│ ├── ui/
│ │ ├── app.py # Main application & system tray
│ │ ├── snipping_widget.py # Screen capture overlay
│ │ ├── result_dialog.py # OCR result display
│ │ ├── settings_dialog.py # Settings configuration
│ │ └── latex_preview.py # LaTeX rendering widget
│ ├── core/
│ │ ├── hotkey_manager.py # Global hotkey handling
│ │ ├── screen_capture.py # Screen capture utilities
│ │ └── clipboard.py # Clipboard operations
│ ├── ocr/
│ │ ├── hunyuan_ocr.py # HunyuanOCR with Transformers
│ │ ├── hunyuan_vllm.py # HunyuanOCR with vLLM
│ │ └── prompts.py # Prompt templates
│ └── utils/
│ ├── config.py # Configuration management
│ └── text_cleaner.py # Text processing utilities
├── resources/
│ ├── icons/ # Application icons
│ └── styles/
│ └── dark_theme.qss # Dark theme stylesheet
├── requirements.txt # Python dependencies
├── install.sh # Installation script
├── run.sh # Quick start script
└── README.md # This file
Settings can be accessed via the system tray menu:
- OCR Engine: Choose between
transformers(default) orvllm(faster) - Device: Select
cuda,cpu, ormps(for Mac) - Default Prompt: Choose the default OCR mode
- Auto-copy: Automatically copy results to clipboard
- Theme: Choose between dark, light, or system theme
If you encounter issues loading the model:
# Make sure you have the correct transformers branch
pip install --force-reinstall git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4
# Verify the model can be loaded
python -c "from transformers import AutoProcessor; AutoProcessor.from_pretrained('tencent/HunyuanOCR', trust_remote_code=True)"If you get CUDA OOM errors:
- Use a smaller image region
- Switch to CPU mode in settings (slower but uses less memory)
- Close other GPU-intensive applications
On some Linux systems, you may need to run with elevated permissions:
sudo python src/main.pyOn macOS, you may need to grant:
- Screen Recording permission
- Accessibility permission
Go to: System Preferences → Security & Privacy → Privacy
See LICENSE file for details.
- HunyuanOCR by Tencent for the OCR model
- PyQt6 for the GUI framework
- vLLM for fast inference
- Custom hotkey configuration
- Multiple language support in UI
- Export to different formats (PDF, DOCX)
- History of captured regions
- OCR result editing
- Batch processing of images
- Cloud sync for captured results
For issues and feature requests, please use the GitHub issue tracker.