DISCOVER THE FUTURE OF AI AGENTSarrow_forward

Windows-MCP

calendar_todayAdded Apr 24, 2026
categoryAgent & Tooling
codeOpen Source
PythonWorkflow Automation桌面应用Model Context ProtocolAI AgentsAgent & ToolingModel & Inference FrameworkAutomation, Workflow & RPA

A lightweight MCP Server bridging LLM agents and the Windows OS, enabling CV-free desktop UI automation based on the a11y tree.

Windows-MCP is a desktop control gateway designed for LLM agents, extending large model capabilities to graphical user interfaces via the MCP protocol. The project's core highlight is its departure from traditional computer vision approaches, instead deeply leveraging the Windows UI Automation (a11y tree) for precise element localization and interaction. It features 18 highly modular tools covering a comprehensive range of operations, from basic mouse clicks, keyboard input, and scrolling/screenshots, to complex window management, process control, registry read/write, and PowerShell execution.

Tool Set Overview

  • Desktop UI Automation: Click (coordinate clicking), Type (element input), Scroll (area scrolling), Move (mouse move/drag), Shortcut (keyboard shortcuts), Wait (pause)
  • State Awareness: Screenshot (capture with cursor and active window info), Snapshot (full desktop state capture with UI element IDs and scrollable areas; use_dom=True for browser DOM mode)
  • App & System Management: App (launch/resize/switch), Shell (PowerShell execution), Process (process management), Registry (registry read/write), Clipboard (clipboard read/write), Notification (system notifications)
  • Advanced Interaction: MultiSelect (multi-selection), MultiEdit (simultaneous multi-input), Scrape (web content extraction)

Architecture & Implementation

  • Built on fastmcp (>=3.0), entry point windows_mcp.__main__:main, source in src/windows_mcp/ (with uia/ submodule)
  • UI automation core uses Python-UIAutomation-for-Windows for a11y tree parsing, combined with PyAutoGUI for input simulation
  • Windows native API access via pywin32 and comtypes
  • Screenshot engine with 3-tier fallback: dxcam (GPU-accelerated) → mss → pillow
  • Text matching via fuzzywuzzy + python-levenshtein
  • Web content conversion using markdownify
  • Engineering: setuptools build, Ruff linting, pytest-asyncio testing

Runtime Characteristics

  • Typical inter-action latency of 0.2–0.9 seconds
  • Multiple transport protocols: stdio (default), SSE, Streamable HTTP
  • Virtual machine support added
  • Compatible with Claude Desktop, Claude Code, Perplexity Desktop, Gemini CLI, Qwen Code, Codex CLI, and more
  • Target platforms: Windows 7/8/8.1/10/11, minimum Python 3.13+

Installation & Configuration Recommended PyPI installation: uvx windows-mcp Key environment variables: WINDOWS_MCP_SCREENSHOT_SCALE (screenshot scaling, 0.5 recommended for HiDPI), WINDOWS_MCP_SCREENSHOT_BACKEND (screenshot backend: auto/dxcam/mss/pillow), ANONYMIZED_TELEMETRY (telemetry toggle), WINDOWS_MCP_DEBUG (debug mode)

Unconfirmed Items

  • "2M+ Users" claim sourced from Claude.ai/directory, cannot be independently verified
  • README claims Windows 7/8/8.1 support, but PyPI classifiers only list Windows 10/11
  • dxcam requires GPU passthrough in VMs; screenshot backend behavior in VMs needs confirmation
  • Specific telemetry data collection scope requires reviewing the Security Policy

Derivative Project: Windows-Use (standalone AI agent built on top of Windows-MCP)

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch