A lightweight MCP Server bridging LLM agents and the Windows OS, enabling CV-free desktop UI automation based on the a11y tree.
Windows-MCP is a desktop control gateway designed for LLM agents, extending large model capabilities to graphical user interfaces via the MCP protocol. The project's core highlight is its departure from traditional computer vision approaches, instead deeply leveraging the Windows UI Automation (a11y tree) for precise element localization and interaction. It features 18 highly modular tools covering a comprehensive range of operations, from basic mouse clicks, keyboard input, and scrolling/screenshots, to complex window management, process control, registry read/write, and PowerShell execution.
Tool Set Overview
- Desktop UI Automation: Click (coordinate clicking), Type (element input), Scroll (area scrolling), Move (mouse move/drag), Shortcut (keyboard shortcuts), Wait (pause)
- State Awareness: Screenshot (capture with cursor and active window info), Snapshot (full desktop state capture with UI element IDs and scrollable areas;
use_dom=Truefor browser DOM mode) - App & System Management: App (launch/resize/switch), Shell (PowerShell execution), Process (process management), Registry (registry read/write), Clipboard (clipboard read/write), Notification (system notifications)
- Advanced Interaction: MultiSelect (multi-selection), MultiEdit (simultaneous multi-input), Scrape (web content extraction)
Architecture & Implementation
- Built on fastmcp (>=3.0), entry point
windows_mcp.__main__:main, source insrc/windows_mcp/(withuia/submodule) - UI automation core uses Python-UIAutomation-for-Windows for a11y tree parsing, combined with PyAutoGUI for input simulation
- Windows native API access via pywin32 and comtypes
- Screenshot engine with 3-tier fallback: dxcam (GPU-accelerated) → mss → pillow
- Text matching via fuzzywuzzy + python-levenshtein
- Web content conversion using markdownify
- Engineering: setuptools build, Ruff linting, pytest-asyncio testing
Runtime Characteristics
- Typical inter-action latency of 0.2–0.9 seconds
- Multiple transport protocols: stdio (default), SSE, Streamable HTTP
- Virtual machine support added
- Compatible with Claude Desktop, Claude Code, Perplexity Desktop, Gemini CLI, Qwen Code, Codex CLI, and more
- Target platforms: Windows 7/8/8.1/10/11, minimum Python 3.13+
Installation & Configuration
Recommended PyPI installation: uvx windows-mcp
Key environment variables: WINDOWS_MCP_SCREENSHOT_SCALE (screenshot scaling, 0.5 recommended for HiDPI), WINDOWS_MCP_SCREENSHOT_BACKEND (screenshot backend: auto/dxcam/mss/pillow), ANONYMIZED_TELEMETRY (telemetry toggle), WINDOWS_MCP_DEBUG (debug mode)
Unconfirmed Items
- "2M+ Users" claim sourced from Claude.ai/directory, cannot be independently verified
- README claims Windows 7/8/8.1 support, but PyPI classifiers only list Windows 10/11
- dxcam requires GPU passthrough in VMs; screenshot backend behavior in VMs needs confirmation
- Specific telemetry data collection scope requires reviewing the Security Policy
Derivative Project: Windows-Use (standalone AI agent built on top of Windows-MCP)