Windows-MCP

A lightweight MCP Server bridging LLM agents and the Windows OS, enabling CV-free desktop UI automation based on the a11y tree.

Windows-MCP is a desktop control gateway designed for LLM agents, extending large model capabilities to graphical user interfaces via the MCP protocol. The project's core highlight is its departure from traditional computer vision approaches, instead deeply leveraging the Windows UI Automation (a11y tree) for precise element localization and interaction. It features 18 highly modular tools covering a comprehensive range of operations, from basic mouse clicks, keyboard input, and scrolling/screenshots, to complex window management, process control, registry read/write, and PowerShell execution.

Tool Set Overview

Desktop UI Automation: Click (coordinate clicking), Type (element input), Scroll (area scrolling), Move (mouse move/drag), Shortcut (keyboard shortcuts), Wait (pause)
State Awareness: Screenshot (capture with cursor and active window info), Snapshot (full desktop state capture with UI element IDs and scrollable areas; use_dom=True for browser DOM mode)
App & System Management: App (launch/resize/switch), Shell (PowerShell execution), Process (process management), Registry (registry read/write), Clipboard (clipboard read/write), Notification (system notifications)
Advanced Interaction: MultiSelect (multi-selection), MultiEdit (simultaneous multi-input), Scrape (web content extraction)

Architecture & Implementation

Built on fastmcp (>=3.0), entry point windows_mcp.__main__:main, source in src/windows_mcp/ (with uia/ submodule)
UI automation core uses Python-UIAutomation-for-Windows for a11y tree parsing, combined with PyAutoGUI for input simulation
Windows native API access via pywin32 and comtypes
Screenshot engine with 3-tier fallback: dxcam (GPU-accelerated) → mss → pillow
Text matching via fuzzywuzzy + python-levenshtein
Web content conversion using markdownify
Engineering: setuptools build, Ruff linting, pytest-asyncio testing

Runtime Characteristics

Typical inter-action latency of 0.2–0.9 seconds
Multiple transport protocols: stdio (default), SSE, Streamable HTTP
Virtual machine support added
Compatible with Claude Desktop, Claude Code, Perplexity Desktop, Gemini CLI, Qwen Code, Codex CLI, and more
Target platforms: Windows 7/8/8.1/10/11, minimum Python 3.13+

Installation & Configuration Recommended PyPI installation: uvx windows-mcp Key environment variables: WINDOWS_MCP_SCREENSHOT_SCALE (screenshot scaling, 0.5 recommended for HiDPI), WINDOWS_MCP_SCREENSHOT_BACKEND (screenshot backend: auto/dxcam/mss/pillow), ANONYMIZED_TELEMETRY (telemetry toggle), WINDOWS_MCP_DEBUG (debug mode)

Unconfirmed Items

"2M+ Users" claim sourced from Claude.ai/directory, cannot be independently verified
README claims Windows 7/8/8.1 support, but PyPI classifiers only list Windows 10/11
dxcam requires GPU passthrough in VMs; screenshot backend behavior in VMs needs confirmation
Specific telemetry data collection scope requires reviewing the Security Policy

Derivative Project: Windows-Use (standalone AI agent built on top of Windows-MCP)

Related Projects

Memento MCP

agentgateway

SwarmClaw

STAY UPDATED