Desktop Control MCP

An MCP server that gives LLMs the ability to see and control a Windows desktop via screenshots, mouse, keyboard, and UI element detection.

How It Works

The server provides tools — the LLM client does all the reasoning:

Screenshot the desktop to see what's on screen
Identify UI elements via vision or Windows UI Automation
Act using mouse clicks, keyboard input, or hotkeys
Verify by taking another screenshot after each action

Three-Layer Element Detection

Layer	Method	Best For
CDP	Chrome DevTools Protocol via WebSocket	Electron apps (Discord, VS Code, Slack) — requires `method="path"` launch
UIA	Windows UI Automation COM API	Native apps (Notepad, Explorer, Settings, Office)
Vision	Screenshot + LLM vision	Everything else — LLM estimates coordinates from the image

Installation

cd Desktop-Control-MCP
pip install -e .

Register with an MCP Client

Add the server to your MCP client's configuration. Examples for popular clients:

Claude Desktop — %APPDATA%\Claude\claude_desktop_config.json:

{
  "mcpServers": {
    "desktop-control": {
      "command": "python",
      "args": ["-m", "desktop_control.server"],
      "cwd": "C:\\path\\to\\Desktop-Control-MCP"
    }
  }
}

Claude Code — .claude/settings.json or via claude mcp add:

claude mcp add desktop-control -- python -m desktop_control.server

Cursor — .cursor/mcp.json:

{
  "mcpServers": {
    "desktop-control": {
      "command": "python",
      "args": ["-m", "desktop_control.server"]
    }
  }
}

OpenAI Agents SDK — via Python:

from agents import Agent
from agents.mcp import MCPServerStdio

server = MCPServerStdio(command="python", args=["-m", "desktop_control.server"])
agent = Agent(name="desktop", tools=server.tools())

Any MCP-compatible client (Windsurf, Cline, Continue, etc.) can connect using the standard stdio transport with python -m desktop_control.server.

Then restart your client.

Tools (12)

Vision & Awareness

Tool	Description
`screenshot`	Capture desktop as JPEG. Returns image + screen dimensions. Coordinates map 1:1 to mouse coordinates.
`get_elements`	Get interactive UI elements with exact bounding boxes. Auto-selects CDP or UIA based on the app.
`get_screen_info`	Monitor geometries, cursor position, DPI scale.
`list_open_windows`	All visible windows with titles, process names, and positions.

Mouse

Tool	Description
`mouse_click`	Click at (x, y). Supports left/right/middle, double-click, modifier keys.
`mouse_drag`	Drag from one point to another.
`mouse_scroll`	Scroll vertically or horizontally at a position.

Keyboard

Tool	Description
`keyboard_type`	Type text. Handles Unicode via clipboard paste.
`keyboard_hotkey`	Press key combos like `["ctrl", "c"]`, `["alt", "tab"]`, `["win"]`.

Convenience

Tool	Description
`open_application`	Open an app by name via Start menu search, Win+R, or direct path.
`click_element`	Find a UI element by name and click its center — no coordinate guessing.
`wait`	Pause for loading screens (max 30s).

Project Structure

src/desktop_control/
  server.py           # FastMCP server, all tool definitions, entry point
  screen.py           # DPI awareness init, screenshot capture via mss
  mouse.py            # Click, drag, scroll via pyautogui
  keyboard.py         # Text typing with Unicode clipboard fallback, hotkeys
  ui_automation.py    # Windows UI Automation (native app element detection)
  cdp.py              # Chrome DevTools Protocol (Electron app element detection)
  element_detection.py # Unified facade — auto-selects CDP or UIA
  windows.py          # App launching, window enumeration, Electron app registry

Dependencies

mcp — MCP Python SDK (FastMCP)
pyautogui — Mouse/keyboard control
Pillow — Image processing
mss — Fast multi-monitor screenshots
comtypes — Windows UI Automation COM access
websockets + aiohttp — CDP communication for Electron apps

Key Technical Details

DPI Awareness

screen.py calls SetProcessDpiAwareness(2) at module level before any GUI library imports. This ensures mss captures at physical pixel resolution and pyautogui coordinates match screen pixels. The server import order in server.py enforces this.

Screenshot Coordinates

Screenshots default to monitor_index=1 (primary monitor). Using monitor_index=0 captures the virtual combined monitor which has different dimensions — this caused a ~40% coordinate mismatch on systems where the virtual and primary monitor sizes differ.

Region screenshots include offset instructions so the LLM can map pixel positions back to screen coordinates.

Electron App Support

Electron apps (Discord, VS Code, Slack, etc.) expose limited UIA elements. For full DOM access, launch them with open_application(name, method="path") which adds --remote-debugging-port. Known apps and their debug ports are registered in windows.py:ELECTRON_APPS.

Safety

pyautogui.FAILSAFE = True — move mouse to top-left corner (0,0) to abort
wait capped at 30 seconds to prevent hangs
Cannot interact with UAC prompts (secure desktop)

Example Usage

From any MCP client (after registering the server):

"Open Discord and join the General voice channel"

The LLM will:

open_application("Discord") or click the taskbar icon
screenshot() to see the Discord window
mouse_click() on the correct server icon
screenshot() to verify navigation
mouse_click() on the voice channel
screenshot() to confirm joined

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/desktop_control		src/desktop_control
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Desktop Control MCP

How It Works

Three-Layer Element Detection

Installation

Register with an MCP Client

Tools (12)

Vision & Awareness

Mouse

Keyboard

Convenience

Project Structure

Dependencies

Key Technical Details

DPI Awareness

Screenshot Coordinates

Electron App Support

Safety

Example Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Desktop Control MCP

How It Works

Three-Layer Element Detection

Installation

Register with an MCP Client

Tools (12)

Vision & Awareness

Mouse

Keyboard

Convenience

Project Structure

Dependencies

Key Technical Details

DPI Awareness

Screenshot Coordinates

Electron App Support

Safety

Example Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages