DISCOVER THE FUTURE OF AI AGENTSarrow_forward

Web-Use

calendar_todayAdded Apr 22, 2026
categoryAgent & Tooling
codeOpen Source
PythonModel Context ProtocolMultimodalAI AgentsBrowser AutomationAgent & ToolingModel & Inference FrameworkAutomation, Workflow & RPAProtocol, API & IntegrationComputer Vision & Multimodal

An intelligent autonomous browser agent driven by Chrome DevTools Protocol, supporting multi-LLM backends, vision understanding, and WebMCP protocol extension for end-to-end web tasks including navigation, dynamic interaction, and file operations.

Positioning#

Web-Use is an intelligent autonomous browser agent directly driven by Chrome DevTools Protocol (CDP) via WebSocket connections to real Chrome/Edge browsers, enabling LLMs with end-to-end web operation capabilities.

Core Capabilities#

Autonomous Browsing & Interaction#

  • Autonomous Web Navigation: Automatically navigate websites, fill forms, interact with dynamic content (SPAs, etc.) without human intervention
  • Efficient Element Interaction: Indexed DOM elements for fast, precise click/input operations
  • File Operations: Support file download and upload
  • State Awareness: Maintain page state understanding, avoid infinite loops, and auto-recover from errors
  • Smart Waiting: Handle loading states, animations, CAPTCHAs, OTPs, and other human-interactive scenarios

Multi-Model & Vision#

  • Multi-LLM Support: 13 built-in providers — Anthropic Claude, Google Gemini, OpenAI, Groq, Ollama, Cerebras, Mistral, DeepSeek, NVIDIA, vLLM, Azure OpenAI, OpenRouter, LiteLLM
  • Vision Capability: Screenshot-based page visual understanding for decision support (use_vision=True)

Protocol & Extension#

  • WebMCP (Web Model Context Protocol): Automatically discover site-exposed custom tools, dynamically register and invoke them like built-in tools; supports parameter validation and schema display

Operations & Control#

  • Human-in-the-loop: Configurable pause for human input (include_human_in_loop=True)
  • Browser Persistence: Keep browser open after task completion (keep_alive=True)
  • System Profile Reuse: Use real browser profiles to retain login state and authentication

Typical Use Cases#

  • E-commerce Price Comparison: Auto-search and aggregate prices across sellers on platforms like Amazon
  • Social Media Automation: Auto-login to X/Twitter and publish posts
  • Video Playback: Search and play specified videos on YouTube
  • GitHub Navigation: Auto-login and browse specified repositories
  • In-Site Documentation Search: Leverage WebMCP to invoke custom tools exposed by documentation sites
  • Web Data Extraction: Automated browsing and structured information extraction
  • Form Filling: Automate repetitive form-filling workflows

Architecture#

The project adopts a layered architecture (src/ directory):

ModuleResponsibility
agent/Core agent logic: base class, main loop, service layer, view rendering
agent/browser/Browser connection and CDP communication management
agent/context/Context management
agent/dom/DOM element indexing and interaction
agent/events/Event system
agent/registry/Tool/resource registry
agent/tools/Built-in agent toolset
agent/watchdog/Timeout and exception monitoring
cdp/Chrome DevTools Protocol abstraction layer
messages/Message/conversation models
providers/LLM Provider abstraction layer (13 implementations)
tools/Tool service layer

Key Mechanisms:

  • Direct browser control via CDP using WebSocket connections (websockets library), not high-level wrappers like Selenium/Playwright
  • DOM element indexing for accelerated element location
  • Pillow for screenshot-based vision understanding
  • markdownify for HTML-to-Markdown conversion for LLM comprehension
  • pyotp for OTP verification scenarios
  • Build system: Hatchling

Inspired by: vimGPT, WebVoyager, LangGraph Examples

Installation & Quick Start#

Prerequisites: Python ≥ 3.13, UV package manager, Chrome browser (remote debugging enabled)

git clone https://github.com/CursorTouch/Web-Use.git
cd Web-Use
uv sync
chrome --remote-debugging-port=9222

Configure .env file:

GOOGLE_API_KEY="<API_KEY_HERE>"

Minimal running code:

from src.agent.browser.config import BrowserConfig
from src.providers.ollama import ChatOllama
from src.agent import Agent
from dotenv import load_dotenv

load_dotenv()
llm = ChatOllama(model='qwen3.5:397b-cloud', temperature=0.5)
config = BrowserConfig(browser='chrome', headless=False, use_system_profile=True)
agent = Agent(config=config, llm=llm, use_vision=False, use_web_mcp=True, max_steps=100)
user_query = input('Enter your query: ')
agent.print_response(user_query)
uv run main.py

Key Configuration Parameters#

Agent Constructor Parameters:

ParameterTypeDefaultDescription
configBrowserConfigRequiredBrowser configuration
llmBaseChatLLMRequiredLanguage model instance
use_visionboolFalseEnable screenshot vision understanding
use_web_mcpboolFalseEnable WebMCP protocol for site tool discovery
max_stepsint25Maximum execution steps
max_consecutive_failuresint3Consecutive failure retry limit
include_human_in_loopboolFalsePause for human input
keep_aliveboolFalseKeep browser open after task completion

BrowserConfig Parameters:

ParameterDefaultDescription
browser'chrome'Browser type ('chrome' or 'edge')
headlessFalseHeadless mode
use_system_profileTrueUse system browser profile
user_data_dirCustom profile directory path
cdp_port9222CDP protocol port
downloads_dir'/Downloads'Download directory
attach_to_existingFalseConnect to an already running browser
update_cdpFalseRegenerate CDP protocol files

Unconfirmed Information#

  • Python version requirement contradiction: README suggests 3.11+, pyproject.toml requires ≥ 3.13
  • Repository topic includes langgraph, but no explicit reference found in code
  • WebMCP protocol has no formal specification document link
  • No independent website or documentation site; docs are centralized in GitHub README
  • No formal paper, Hugging Face page, or other external resource links

Primary languages: Python (99.7%), JavaScript (0.3%). Authors: Jeomon George, Muhammad Yaseen. Current version: v0.2. MIT License.

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch