Web-Use

An intelligent autonomous browser agent driven by Chrome DevTools Protocol, supporting multi-LLM backends, vision understanding, and WebMCP protocol extension for end-to-end web tasks including navigation, dynamic interaction, and file operations.

Positioning#

Web-Use is an intelligent autonomous browser agent directly driven by Chrome DevTools Protocol (CDP) via WebSocket connections to real Chrome/Edge browsers, enabling LLMs with end-to-end web operation capabilities.

Core Capabilities#

Autonomous Browsing & Interaction#

Autonomous Web Navigation: Automatically navigate websites, fill forms, interact with dynamic content (SPAs, etc.) without human intervention
Efficient Element Interaction: Indexed DOM elements for fast, precise click/input operations
File Operations: Support file download and upload
State Awareness: Maintain page state understanding, avoid infinite loops, and auto-recover from errors
Smart Waiting: Handle loading states, animations, CAPTCHAs, OTPs, and other human-interactive scenarios

Multi-Model & Vision#

Multi-LLM Support: 13 built-in providers — Anthropic Claude, Google Gemini, OpenAI, Groq, Ollama, Cerebras, Mistral, DeepSeek, NVIDIA, vLLM, Azure OpenAI, OpenRouter, LiteLLM
Vision Capability: Screenshot-based page visual understanding for decision support (use_vision=True)

Protocol & Extension#

WebMCP (Web Model Context Protocol): Automatically discover site-exposed custom tools, dynamically register and invoke them like built-in tools; supports parameter validation and schema display

Operations & Control#

Human-in-the-loop: Configurable pause for human input (include_human_in_loop=True)
Browser Persistence: Keep browser open after task completion (keep_alive=True)
System Profile Reuse: Use real browser profiles to retain login state and authentication

Typical Use Cases#

E-commerce Price Comparison: Auto-search and aggregate prices across sellers on platforms like Amazon
Social Media Automation: Auto-login to X/Twitter and publish posts
Video Playback: Search and play specified videos on YouTube
GitHub Navigation: Auto-login and browse specified repositories
In-Site Documentation Search: Leverage WebMCP to invoke custom tools exposed by documentation sites
Web Data Extraction: Automated browsing and structured information extraction
Form Filling: Automate repetitive form-filling workflows

Architecture#

The project adopts a layered architecture (src/ directory):

Module	Responsibility
`agent/`	Core agent logic: base class, main loop, service layer, view rendering
`agent/browser/`	Browser connection and CDP communication management
`agent/context/`	Context management
`agent/dom/`	DOM element indexing and interaction
`agent/events/`	Event system
`agent/registry/`	Tool/resource registry
`agent/tools/`	Built-in agent toolset
`agent/watchdog/`	Timeout and exception monitoring
`cdp/`	Chrome DevTools Protocol abstraction layer
`messages/`	Message/conversation models
`providers/`	LLM Provider abstraction layer (13 implementations)
`tools/`	Tool service layer

Key Mechanisms:

Direct browser control via CDP using WebSocket connections (websockets library), not high-level wrappers like Selenium/Playwright
DOM element indexing for accelerated element location
Pillow for screenshot-based vision understanding
markdownify for HTML-to-Markdown conversion for LLM comprehension
pyotp for OTP verification scenarios
Build system: Hatchling

Inspired by: vimGPT, WebVoyager, LangGraph Examples

Installation & Quick Start#

Prerequisites: Python ≥ 3.13, UV package manager, Chrome browser (remote debugging enabled)

git clone https://github.com/CursorTouch/Web-Use.git
cd Web-Use
uv sync
chrome --remote-debugging-port=9222

Configure .env file:

GOOGLE_API_KEY="<API_KEY_HERE>"

Minimal running code:

from src.agent.browser.config import BrowserConfig
from src.providers.ollama import ChatOllama
from src.agent import Agent
from dotenv import load_dotenv

load_dotenv()
llm = ChatOllama(model='qwen3.5:397b-cloud', temperature=0.5)
config = BrowserConfig(browser='chrome', headless=False, use_system_profile=True)
agent = Agent(config=config, llm=llm, use_vision=False, use_web_mcp=True, max_steps=100)
user_query = input('Enter your query: ')
agent.print_response(user_query)

uv run main.py

Key Configuration Parameters#

Agent Constructor Parameters:

Parameter	Type	Default	Description
`config`	BrowserConfig	Required	Browser configuration
`llm`	BaseChatLLM	Required	Language model instance
`use_vision`	bool	`False`	Enable screenshot vision understanding
`use_web_mcp`	bool	`False`	Enable WebMCP protocol for site tool discovery
`max_steps`	int	`25`	Maximum execution steps
`max_consecutive_failures`	int	`3`	Consecutive failure retry limit
`include_human_in_loop`	bool	`False`	Pause for human input
`keep_alive`	bool	`False`	Keep browser open after task completion

BrowserConfig Parameters:

Parameter	Default	Description
`browser`	`'chrome'`	Browser type (`'chrome'` or `'edge'`)
`headless`	`False`	Headless mode
`use_system_profile`	`True`	Use system browser profile
`user_data_dir`	—	Custom profile directory path
`cdp_port`	`9222`	CDP protocol port
`downloads_dir`	`'/Downloads'`	Download directory
`attach_to_existing`	`False`	Connect to an already running browser
`update_cdp`	`False`	Regenerate CDP protocol files

Unconfirmed Information#

Python version requirement contradiction: README suggests 3.11+, pyproject.toml requires ≥ 3.13
Repository topic includes langgraph, but no explicit reference found in code
WebMCP protocol has no formal specification document link
No independent website or documentation site; docs are centralized in GitHub README
No formal paper, Hugging Face page, or other external resource links

Primary languages: Python (99.7%), JavaScript (0.3%). Authors: Jeomon George, Muhammad Yaseen. Current version: v0.2. MIT License.

Positioning#

Core Capabilities#

Autonomous Browsing & Interaction#

Multi-Model & Vision#

Protocol & Extension#

Operations & Control#

Typical Use Cases#

Architecture#

Installation & Quick Start#

Key Configuration Parameters#

Unconfirmed Information#

Related Projects

Basic Memory

vfs (Virtual Function Signatures)

RexCLI

STAY UPDATED