An open-source full-stack framework for autonomous computer agents, enabling control of browsers, terminals, and desktop apps via natural language in Docker VMs. Maintained by coasty-ai under Apache 2.0 license, achieving 82% on OSWorld Benchmark.
Project Overview#
Open Computer Use is an open-source full-stack framework for autonomous virtual computer agents, enabling AI to operate browsers, terminals, and desktop applications like humans. Positioned as "The Open Framework for autonomous virtual computer agents at scale", it supports 100% self-hosting.
Core Capabilities#
Browser Agent#
- Search-first strategy (Google Search API)
- Intelligent web navigation and auto form filling
- Element detection and smart clicking
- Multi-tab parallel management
- Screenshot verification and visual feedback
Terminal Agent#
- Command execution in isolated environments
- File operations (read/write/edit/delete)
- Script execution and package installation
- Real-time streaming output
Desktop Agent#
- CV-based UI element detection
- Mouse/keyboard control, window management
- OCR text extraction
- Cross-platform support (Linux desktop)
Multi-Agent System#
- AI Planner for task decomposition
- Sequential execution with context passing
- Error handling and automatic retry
- User interaction confirmation mechanism
Performance Metrics (Official)#
- OSWorld Benchmark #1 rank, 82% score
- Average task completion ~45 seconds
- 50+ concurrent sessions per server
- Tool call latency <500ms
- VM startup time ~15 seconds
Use Cases#
- Research & Data Collection: Web scraping, competitive analysis, academic paper collection
- Testing & QA: Automated UI testing, cross-browser testing, regression testing
- DevOps & Automation: Server configuration, deployment automation, log analysis
- E-commerce Operations: Price monitoring, order management, inventory tracking
- Business Intelligence: Report generation, dashboard monitoring, KPI tracking
System Architecture#
Frontend (Next.js 15) → Backend API (FastAPI) → Docker VM (Ubuntu 22.04 + XFCE)
Frontend provides chat UI, model selection, VM management; Backend serves as orchestration layer for AI planning, multi-agent execution, WebSocket communication; Docker VM contains Chrome browser, terminal, VNC Server.
Installation & Deployment#
Prerequisites: Node.js 20+, Python 3.10+, Docker & Docker Compose, Supabase account, AI provider API keys
git clone https://github.com/coasty-ai/open-computer-use.git
cd open-computer-use
cp .env.example .env
cp backend/.env.example backend/.env
npm install
cd backend && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
docker-compose up --build
Access: Frontend http://localhost:3000, Backend http://localhost:8001
Key Configuration#
Required Environment Variables: Supabase config (URL, Anon Key, Service Role), Security keys (ENCRYPTION_KEY, CSRF_SECRET)
AI Provider Support: OpenAI, Anthropic, Google, xAI, Mistral, Azure, Perplexity, OpenRouter (100+ models)
BYOK Mode: All API keys encrypted at rest, users maintain full control over AI costs and usage
Pending Verification#
- OSWorld Benchmark 82% score requires independent verification
- Windows/macOS VM support planned for Q1 2026
- No associated academic papers found