Skip to content

xuanvinh1997/math-snip-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HunyuanOCR Snipping Tool

A Mathpix-like snipping tool powered by HunyuanOCR-1B for extracting text, mathematical formulas, and tables from screenshots.

✨ Features

  • 📸 Screen Region Capture: Select any region of your screen to OCR
  • 📐 LaTeX Output: Automatic conversion of mathematical formulas to LaTeX
  • 📊 Table Extraction: Extract tables in HTML format
  • 📄 Document Parsing: Full document extraction in Markdown format
  • ⌨️ Global Hotkeys: Quick access with customizable keyboard shortcuts
  • 🎨 LaTeX Preview: Render and preview LaTeX formulas
  • 💾 Multiple Export Options: Copy to clipboard or save to file
  • 🚀 GPU Acceleration: CUDA support for fast processing
  • Optional vLLM Backend: Even faster inference with vLLM

🎯 Key Capabilities

Feature Details
Model HunyuanOCR-1B (~2GB)
Output Formats LaTeX (math), HTML (tables), Markdown (docs)
Supported Languages English, Chinese, Vietnamese, and more
Hotkeys Ctrl+Shift+S/M/T/P for different modes

📋 Requirements

  • Python 3.11+
  • CUDA-capable GPU (recommended)
  • ~4GB VRAM for the model
  • ~2GB disk space for model weights

🚀 Installation

Quick Installation (Recommended)

# Clone the repository
git clone <repository-url>
cd math-snip-tool

# Run the installation script
chmod +x install.sh
./install.sh

The installation script will:

  1. Create a conda environment (math-snip-tool)
  2. Install PyTorch with CUDA support
  3. Install the specific transformers branch required for HunyuanOCR
  4. Install all dependencies
  5. Download the HunyuanOCR model

Manual Installation

# 1. Create conda environment
conda create -n math-snip-tool python=3.11
conda activate math-snip-tool

# 2. Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# 3. Install specific transformers branch (REQUIRED!)
pip install git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4

# 4. Install other dependencies
pip install PyQt6 pillow mss pynput pyperclip accelerate matplotlib colorlog sentencepiece

# 5. Download model (will cache ~2GB)
python -c "from transformers import AutoProcessor; AutoProcessor.from_pretrained('tencent/HunyuanOCR', trust_remote_code=True)"

Optional: vLLM for Faster Inference

pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

🎮 Usage

Starting the Application

# Activate the environment
conda activate math-snip-tool

# Run the application
python src/main.py

# Or use the run script
chmod +x run.sh
./run.sh

The application will start in the system tray. Look for the icon in your system tray to access the menu.

Hotkeys

Shortcut Action Description
Ctrl+Shift+S Capture Document Full document with Markdown + LaTeX + HTML tables
Ctrl+Shift+M Capture Math Math formula extraction (LaTeX output)
Ctrl+Shift+T Capture Table Table extraction (HTML output)
Ctrl+Shift+P Capture Plain Text Plain text extraction only
Esc Cancel Cancel the current snipping operation

Using the Application

  1. Trigger Capture: Press one of the hotkeys or click the system tray icon
  2. Select Region: Click and drag to select the area you want to capture
  3. Wait for Processing: The OCR will process the image (may take a few seconds)
  4. View Results: A dialog will appear with the results
  5. Copy or Save: Use the buttons to copy to clipboard or save to file

🏗️ Project Structure

math-snip-tool/
├── src/
│   ├── main.py                    # Entry point
│   ├── ui/
│   │   ├── app.py                 # Main application & system tray
│   │   ├── snipping_widget.py     # Screen capture overlay
│   │   ├── result_dialog.py       # OCR result display
│   │   ├── settings_dialog.py     # Settings configuration
│   │   └── latex_preview.py       # LaTeX rendering widget
│   ├── core/
│   │   ├── hotkey_manager.py      # Global hotkey handling
│   │   ├── screen_capture.py      # Screen capture utilities
│   │   └── clipboard.py           # Clipboard operations
│   ├── ocr/
│   │   ├── hunyuan_ocr.py         # HunyuanOCR with Transformers
│   │   ├── hunyuan_vllm.py        # HunyuanOCR with vLLM
│   │   └── prompts.py             # Prompt templates
│   └── utils/
│       ├── config.py              # Configuration management
│       └── text_cleaner.py        # Text processing utilities
├── resources/
│   ├── icons/                     # Application icons
│   └── styles/
│       └── dark_theme.qss         # Dark theme stylesheet
├── requirements.txt               # Python dependencies
├── install.sh                     # Installation script
├── run.sh                         # Quick start script
└── README.md                      # This file

⚙️ Configuration

Settings can be accessed via the system tray menu:

  • OCR Engine: Choose between transformers (default) or vllm (faster)
  • Device: Select cuda, cpu, or mps (for Mac)
  • Default Prompt: Choose the default OCR mode
  • Auto-copy: Automatically copy results to clipboard
  • Theme: Choose between dark, light, or system theme

🐛 Troubleshooting

Model Loading Issues

If you encounter issues loading the model:

# Make sure you have the correct transformers branch
pip install --force-reinstall git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4

# Verify the model can be loaded
python -c "from transformers import AutoProcessor; AutoProcessor.from_pretrained('tencent/HunyuanOCR', trust_remote_code=True)"

CUDA Out of Memory

If you get CUDA OOM errors:

  1. Use a smaller image region
  2. Switch to CPU mode in settings (slower but uses less memory)
  3. Close other GPU-intensive applications

Hotkeys Not Working

On some Linux systems, you may need to run with elevated permissions:

sudo python src/main.py

Permission Issues (macOS)

On macOS, you may need to grant:

  • Screen Recording permission
  • Accessibility permission

Go to: System Preferences → Security & Privacy → Privacy

📝 License

See LICENSE file for details.

🙏 Acknowledgments

🔮 Future Enhancements

  • Custom hotkey configuration
  • Multiple language support in UI
  • Export to different formats (PDF, DOCX)
  • History of captured regions
  • OCR result editing
  • Batch processing of images
  • Cloud sync for captured results

📧 Contact

For issues and feature requests, please use the GitHub issue tracker.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors