open-computer-use

A secure cloud Linux computer powered by E2B Desktop Sandbox and controlled by open-source LLMs, enabling automated computer interaction through keyboard, mouse, and shell commands.

One-Minute Overview#

open-computer-use is a system that enables AI to operate computers, providing a secure cloud Linux environment controlled by various open-source language models. It's designed for scenarios requiring AI to automate complex computer tasks involving visual perception, keyboard/mouse interaction, and shell command execution.

Core Value: Enables AI to interact with computers like humans, executing complex multi-step tasks.

Quick Start#

Installation Difficulty: Medium - Requires obtaining multiple API keys and setting environment variables

# Install prerequisites
brew install poetry ffmpeg

# Clone repository
git clone https://github.com/e2b-dev/open-computer-use/

# Set environment variables
# Create .env file and add API keys
E2B_API_KEY="your-e2b-api-key"
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GROQ_API_KEY=...
# Add relevant API keys based on your model selection

# Start the system
poetry run start --prompt "your-instruction"

Is this suitable for me?

✅ AI Assistant Development: Building AI assistants that need to operate computers

✅ Automated Testing: AI performing automated UI testing tasks

✅ Content Creation: AI automating web operations, image processing, and creative tasks

❌ Simple Scripting Needs: Basic automation without complex interactions

❌ Offline Environments: Requires cloud services, not for fully offline scenarios

Core Capabilities#

1. Multi-Model Support System#

Supports over 10 different LLM models including OpenAI's GPT-4o, Anthropic's Claude, Google's Gemini 2.0, and others Actual Value: Users can select the most suitable models based on their needs, balancing performance and cost

2. Real-time Display Streaming#

Streams the sandbox display to the client computer in real-time Actual Value: Users can visually observe AI's operation process, enabling real-time intervention and guidance

3. Multiple Interaction Methods#

Supports controlling the computer through keyboard, mouse, and shell commands Actual Value: AI can execute a wide range of operations from simple keyboard inputs to complex system commands

4. User Intervention Capability#

Users can pause and prompt the AI at any time Actual Value: Enhances system controllability and security, preventing AI from executing undesired operations

5. Flexible Configuration#

Easily swap and combine different LLM models through simple configuration files Actual Value: Customize AI capabilities without code changes to adapt to different task requirements

Technical Stack & Integration#

Development Language: Python Main Dependencies: E2B API, Poetry (Python package manager), FFmpeg, multiple LLM provider APIs Integration Method: API / SDK / Library

Maintenance Status#

Development Activity: Actively developed with a clear mechanism for extending model providers
Recent Updates: Recently updated to support the latest LLM models like Llama 3.3
Community Response: Welcomes community contributions for new model providers, indicating emphasis on ecosystem building

Documentation & Learning Resources#

Documentation Quality: Basic
Official Documentation: Basic usage guide provided in README
Example Code: Provides startup commands and configuration examples

One-Minute Overview#

Quick Start#

Core Capabilities#

1. Multi-Model Support System#

2. Real-time Display Streaming#

3. Multiple Interaction Methods#

4. User Intervention Capability#

5. Flexible Configuration#

Technical Stack & Integration#

Maintenance Status#

Documentation & Learning Resources#

Related Projects

Zylos Core

verl

Kalshi AI Trading Bot

STAY UPDATED