|
| 1 | +# open-interpreter — Claude Code Skill |
| 2 | + |
| 3 | +A [Claude Code skill](https://code.claude.com/docs/en/skills) for desktop GUI automation, built on top of Open Interpreter's Computer API. Provides mouse, keyboard, screenshot, and OCR control for native macOS/Linux applications that have no CLI or API. |
| 4 | + |
| 5 | +## What is this? |
| 6 | + |
| 7 | +[Claude Code](https://github.com/anthropics/claude-code) is Anthropic's terminal-based AI coding tool. It reads `.claude/skills/` directories for specialized capabilities. This skill gives Claude Code the ability to interact with desktop GUIs by wrapping Open Interpreter's pyautogui + pytesseract primitives in standalone scripts. |
| 8 | + |
| 9 | +## When to Use |
| 10 | + |
| 11 | +- Interacting with desktop apps (System Preferences, Calculator, browsers, any GUI) |
| 12 | +- Automating GUI workflows (form filling, menu navigation, data extraction) |
| 13 | +- Reading screen content via OCR (finding buttons, labels, prices, status text) |
| 14 | +- Controlling mouse and keyboard programmatically |
| 15 | + |
| 16 | +## Modes |
| 17 | + |
| 18 | +| Mode | LLM | Script | Best For | |
| 19 | +|------|-----|--------|----------| |
| 20 | +| **Library** | Claude Code (native) | Individual scripts | Surgical GUI actions — Claude sees screenshots, reasons, dispatches | |
| 21 | +| **OS subprocess** | Claude API (via OI) | `oi_os_mode.py` | Delegating entire GUI tasks to OI's agent loop | |
| 22 | +| **Local agent** | Ollama (offline) | `oi_os_mode.py --local` | Offline computer use, no API costs | |
| 23 | + |
| 24 | +Use Library mode by default. OS subprocess for self-contained GUI tasks. Local agent when offline. |
| 25 | + |
| 26 | +## Prerequisites |
| 27 | + |
| 28 | +- Python 3.10+ |
| 29 | +- [uv](https://github.com/astral-sh/uv) package manager |
| 30 | +- macOS: Accessibility + Screen Recording permissions for terminal app |
| 31 | +- tesseract (`brew install tesseract`) |
| 32 | + |
| 33 | +## Installation |
| 34 | + |
| 35 | +To use this skill, copy the folder into your Claude Code skills directory: |
| 36 | + |
| 37 | +```bash |
| 38 | +cp -r .claude/skills/open-interpreter ~/.claude/skills/open-interpreter |
| 39 | +``` |
| 40 | + |
| 41 | +Then run the install script: |
| 42 | + |
| 43 | +```bash |
| 44 | +~/.claude/skills/open-interpreter/scripts/oi_install.sh |
| 45 | +``` |
| 46 | + |
| 47 | +Verify permissions: |
| 48 | + |
| 49 | +```bash |
| 50 | +python3 ~/.claude/skills/open-interpreter/scripts/oi_permission_check.py |
| 51 | +``` |
| 52 | + |
| 53 | +## Directory Structure |
| 54 | + |
| 55 | +``` |
| 56 | +open-interpreter/ |
| 57 | +├── SKILL.md # Skill instructions for Claude Code |
| 58 | +├── README.md # This file |
| 59 | +├── scripts/ |
| 60 | +│ ├── oi_install.sh # One-shot install + permissions check |
| 61 | +│ ├── oi_screenshot.py # Screen capture with Retina metadata |
| 62 | +│ ├── oi_click.py # Mouse click by coordinates or OCR text |
| 63 | +│ ├── oi_type.py # Keyboard input, hotkeys, key presses |
| 64 | +│ ├── oi_find_text.py # OCR: find text on screen → JSON coords |
| 65 | +│ ├── oi_computer.py # Unified dispatch for all actions |
| 66 | +│ ├── oi_os_mode.py # Launch OI as managed subprocess |
| 67 | +│ └── oi_permission_check.py # Check macOS permissions |
| 68 | +└── references/ |
| 69 | + ├── computer-api.md # OI Computer API reference |
| 70 | + ├── os-mode.md # OS Mode usage and architecture |
| 71 | + └── safety-and-permissions.md # Permissions guide and safety model |
| 72 | +``` |
| 73 | + |
| 74 | +## Scripts |
| 75 | + |
| 76 | +### oi_screenshot.py — Screen capture |
| 77 | + |
| 78 | +```bash |
| 79 | +python3 scripts/oi_screenshot.py # Full screen |
| 80 | +python3 scripts/oi_screenshot.py --region 0,0,800,600 # Region |
| 81 | +python3 scripts/oi_screenshot.py --active-window # Active window only |
| 82 | +``` |
| 83 | + |
| 84 | +Outputs file path + `SCALE_FACTOR` + `SCREEN_SIZE` metadata (3 lines to stdout). |
| 85 | + |
| 86 | +### oi_click.py — Mouse click |
| 87 | + |
| 88 | +```bash |
| 89 | +python3 scripts/oi_click.py --x 450 --y 300 # Coordinate click |
| 90 | +python3 scripts/oi_click.py --x 900 --y 600 --image-coords # Auto-divide by Retina scale |
| 91 | +python3 scripts/oi_click.py --text "Submit" # OCR: find and click text |
| 92 | +python3 scripts/oi_click.py --x 450 --y 300 --double # Double click |
| 93 | +python3 scripts/oi_click.py --x 450 --y 300 --right # Right click |
| 94 | +``` |
| 95 | + |
| 96 | +### oi_type.py — Keyboard input |
| 97 | + |
| 98 | +```bash |
| 99 | +python3 scripts/oi_type.py --text "hello world" # Clipboard-paste (default) |
| 100 | +python3 scripts/oi_type.py --key enter # Single key press |
| 101 | +python3 scripts/oi_type.py --hotkey command space # Hotkey (AppleScript on macOS) |
| 102 | +python3 scripts/oi_type.py --text "search" --method typewrite # Character-by-character |
| 103 | +``` |
| 104 | + |
| 105 | +### oi_find_text.py — OCR screen reading |
| 106 | + |
| 107 | +```bash |
| 108 | +python3 scripts/oi_find_text.py --text "Submit" |
| 109 | +python3 scripts/oi_find_text.py --text "Price" --all --min-conf 80 |
| 110 | +``` |
| 111 | + |
| 112 | +Returns JSON: `[{"text": "Submit", "x": 450, "y": 300, "w": 80, "h": 24, "confidence": 95}]` |
| 113 | + |
| 114 | +### oi_computer.py — Unified dispatch |
| 115 | + |
| 116 | +```bash |
| 117 | +python3 scripts/oi_computer.py screenshot |
| 118 | +python3 scripts/oi_computer.py click --x 450 --y 300 |
| 119 | +python3 scripts/oi_computer.py type --text "hello" |
| 120 | +python3 scripts/oi_computer.py find --text "Submit" |
| 121 | +python3 scripts/oi_computer.py scroll --clicks 3 |
| 122 | +python3 scripts/oi_computer.py mouse-position |
| 123 | +python3 scripts/oi_computer.py screen-size |
| 124 | +``` |
| 125 | + |
| 126 | +### oi_os_mode.py — Delegate full GUI tasks |
| 127 | + |
| 128 | +```bash |
| 129 | +python3 scripts/oi_os_mode.py "Open Calculator and compute 2+2" |
| 130 | +python3 scripts/oi_os_mode.py --local "What apps are open?" # Ollama (offline) |
| 131 | +``` |
| 132 | + |
| 133 | +## Quick Examples |
| 134 | + |
| 135 | +### Open an app via Spotlight |
| 136 | + |
| 137 | +```bash |
| 138 | +python3 scripts/oi_type.py --hotkey command space |
| 139 | +sleep 0.5 |
| 140 | +python3 scripts/oi_type.py --text "Calculator" |
| 141 | +sleep 0.3 |
| 142 | +python3 scripts/oi_type.py --key enter |
| 143 | +``` |
| 144 | + |
| 145 | +### Click a button by label |
| 146 | + |
| 147 | +```bash |
| 148 | +python3 scripts/oi_click.py --text "Save" |
| 149 | +``` |
| 150 | + |
| 151 | +### Read text from screen |
| 152 | + |
| 153 | +```bash |
| 154 | +python3 scripts/oi_find_text.py --text "Total" --all |
| 155 | +``` |
| 156 | + |
| 157 | +### Fill a form |
| 158 | + |
| 159 | +```bash |
| 160 | +python3 scripts/oi_click.py --text "Email" |
| 161 | +python3 scripts/oi_type.py --text "user@example.com" |
| 162 | +python3 scripts/oi_type.py --key tab |
| 163 | +python3 scripts/oi_type.py --text "password123" |
| 164 | +``` |
| 165 | + |
| 166 | +## Retina Display Handling |
| 167 | + |
| 168 | +macOS Retina displays render at 2x scaling. Screenshot image pixels differ from pyautogui screen coordinates. Use `--image-coords` on `oi_click.py` to auto-divide coordinates by the scale factor when targeting positions from screenshot pixels. |
| 169 | + |
| 170 | +## Safety |
| 171 | + |
| 172 | +1. Confirm with user before clicking Send, Delete, Submit, or Confirm buttons |
| 173 | +2. Screenshot before and after every action for verification |
| 174 | +3. No unbounded autonomous loops |
| 175 | +4. pyautogui failsafe: moving mouse to screen corner raises exception |
| 176 | +5. Every script logs actions to stderr: `[oi] click at (450, 300) button=left` |
| 177 | + |
| 178 | +## Troubleshooting |
| 179 | + |
| 180 | +| Symptom | Fix | |
| 181 | +|---------|-----| |
| 182 | +| Black screenshot | Grant Screen Recording permission to terminal app | |
| 183 | +| Click/type no effect | Grant Accessibility permission to terminal app | |
| 184 | +| OCR finds no text | Verify tesseract: `which tesseract && tesseract --version` | |
| 185 | +| Coordinates off by 2x | Use `--image-coords` flag on `oi_click.py` | |
| 186 | +| OS Mode hangs | Verify `ANTHROPIC_API_KEY` is set | |
| 187 | +| Local mode fails | Verify Ollama running: `ollama list` | |
| 188 | + |
| 189 | +## Credits |
| 190 | + |
| 191 | +- [OpenInterpreter](https://github.com/OpenInterpreter/open-interpreter) by Killian Lucas — the foundation this skill builds on |
| 192 | +- [Claudicle](https://github.com/tdimino/claudicle) by Tom di Mino — open-source soul agent framework, LLM-agnostic at the cognitive level |
| 193 | +- Built as a [Claude Code skill](https://code.claude.com/docs/en/skills) following the [Agent Skills](https://agentskills.io/) open standard |
0 commit comments