A secure cloud Linux computer powered by E2B Desktop Sandbox and controlled by open-source LLMs, enabling automated computer interaction through keyboard, mouse, and shell commands.
One-Minute Overview#
open-computer-use is a system that enables AI to operate computers, providing a secure cloud Linux environment controlled by various open-source language models. It's designed for scenarios requiring AI to automate complex computer tasks involving visual perception, keyboard/mouse interaction, and shell command execution.
Core Value: Enables AI to interact with computers like humans, executing complex multi-step tasks.
Quick Start#
Installation Difficulty: Medium - Requires obtaining multiple API keys and setting environment variables
# Install prerequisites
brew install poetry ffmpeg
# Clone repository
git clone https://github.com/e2b-dev/open-computer-use/
# Set environment variables
# Create .env file and add API keys
E2B_API_KEY="your-e2b-api-key"
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GROQ_API_KEY=...
# Add relevant API keys based on your model selection
# Start the system
poetry run start --prompt "your-instruction"
Is this suitable for me?
- ✅ AI Assistant Development: Building AI assistants that need to operate computers
- ✅ Automated Testing: AI performing automated UI testing tasks
- ✅ Content Creation: AI automating web operations, image processing, and creative tasks
- ❌ Simple Scripting Needs: Basic automation without complex interactions
- ❌ Offline Environments: Requires cloud services, not for fully offline scenarios
Core Capabilities#
1. Multi-Model Support System#
- Supports over 10 different LLM models including OpenAI's GPT-4o, Anthropic's Claude, Google's Gemini 2.0, and others Actual Value: Users can select the most suitable models based on their needs, balancing performance and cost
2. Real-time Display Streaming#
- Streams the sandbox display to the client computer in real-time Actual Value: Users can visually observe AI's operation process, enabling real-time intervention and guidance
3. Multiple Interaction Methods#
- Supports controlling the computer through keyboard, mouse, and shell commands Actual Value: AI can execute a wide range of operations from simple keyboard inputs to complex system commands
4. User Intervention Capability#
- Users can pause and prompt the AI at any time Actual Value: Enhances system controllability and security, preventing AI from executing undesired operations
5. Flexible Configuration#
- Easily swap and combine different LLM models through simple configuration files Actual Value: Customize AI capabilities without code changes to adapt to different task requirements
Technical Stack & Integration#
Development Language: Python Main Dependencies: E2B API, Poetry (Python package manager), FFmpeg, multiple LLM provider APIs Integration Method: API / SDK / Library
Maintenance Status#
- Development Activity: Actively developed with a clear mechanism for extending model providers
- Recent Updates: Recently updated to support the latest LLM models like Llama 3.3
- Community Response: Welcomes community contributions for new model providers, indicating emphasis on ecosystem building
Documentation & Learning Resources#
- Documentation Quality: Basic
- Official Documentation: Basic usage guide provided in README
- Example Code: Provides startup commands and configuration examples