agent-browser

An agent-first browser automation CLI from Vercel Labs that enables efficient, low-token web interactions using a Ref-based system.

Project Positioning#

agent-browser is an agent-first headless browser automation CLI tool providing deterministic interaction and high context efficiency for AI agents via accessibility trees and a Ref system.

Traditional browser automation tools (like Puppeteer/Playwright native interfaces) output large DOM trees (typically 3000-5000 tokens), and element selectors (CSS/XPath) are unstable in dynamic pages, making them unsuitable for AI agents (LLMs) to directly consume and operate.

Core Features#

Agent-first design: Output format optimized for AI context efficiency, compact output ~200-400 tokens vs DOM's 3000-5000 tokens
Ref-based element selection: Deterministic element positioning based on accessibility trees and references (e.g., @e1, @e2)
Fast: Native Rust CLI achieves sub-millisecond command parsing
JSON mode: --json flag for machine-readable output

Tech Stack#

TypeScript (58.7%)
Rust (36.1%)
Python (1.9%)
JavaScript (1.7%)
Dependency: Playwright (browser driver)

Supported Platforms#

macOS (ARM64, x64)
Linux (ARM64, x64)
Windows (x64)

Core Capabilities#

Browser Automation#

Navigation control: open, back, forward, reload
Element interaction: click, fill, type, hover, drag, upload
Keyboard control: press, keydown, keyup, keyboard type
Form operations: check, uncheck, select

Information Retrieval#

Page snapshot: snapshot command returns accessibility tree with refs
Text extraction: get text, get html, get value, get attr
Page state: get title, get url, get count, get box, get styles
Screenshot/PDF: screenshot (supports full page and annotations), pdf

Advanced Features#

Session management: Multiple isolated browser instances (--session)
Persistent configuration: Config files and user data persistence (--profile, --session-name)
Network control: Network request interception, routing, mock responses
Storage management: cookies, localStorage, sessionStorage operations
Multi-tab/window: Tab and window management
CDP connection: Connect to external browsers via Chrome DevTools Protocol

Security Features (Optional)#

Auth vault: Locally encrypted credential storage
Content boundary markers: Mark page output to distinguish tool output from untrusted content
Domain allowlist: Restrict navigation to trusted domains
Action policies: Use policy files to control destructive operations
Output length limits: Prevent context flooding attacks

Architecture#

Client-Daemon Architecture#

Rust CLI: Fast native binary, parses commands, communicates with daemon
Node.js Daemon: Manages Playwright browser instances
Fallback: Directly uses Node.js if native binary unavailable

Browser Engine#

Defaults to Chromium
Supports Firefox and WebKit via Playwright protocol
Supports custom browser executables

Ref System#

Generates deterministic element references based on accessibility tree
Avoids re-querying DOM, improves performance
AI-friendly text output format

Typical Use Cases#

AI Agent Automation#

Provides browser automation capabilities for Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini, etc.
Suitable for any AI agent that can execute shell commands

Web Testing#

Login flow testing
Form filling validation
Page interaction testing
Cross-browser compatibility testing

Data Extraction#

Web content scraping
Structured data extraction
Screenshot and PDF generation

Cloud Deployment#

Supports serverless environments (Vercel, AWS Lambda)
Integrates third-party browser services (Browserbase, Browser Use, Kernel)

Installation Methods#

Global Installation (Recommended)#

npm install -g agent-browser
agent-browser install  # Download Chromium

Quick Try (No Installation)#

npx agent-browser open example.com

macOS Homebrew#

brew install agent-browser
agent-browser install  # Download Chromium

Basic Workflow#

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i

# 2. Interact using refs
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser screenshot page.png

# 3. Close browser
agent-browser close

AI Agent Mode#

# JSON output for machine parsing
agent-browser snapshot --json

# Compact interactive element list
agent-browser snapshot -i --json

Key CLI Flags#

--json: Output machine-readable JSON format
--session <name>: Start isolated browser session
--profile <path>: Specify persistent data directory
--headed: Show browser GUI (headless by default)

Key Environment Variables#

AGENT_BROWSER_SESSION_NAME: Auto save/load session state
AGENT_BROWSER_ENCRYPTION_KEY: 64-char hex key for AES-256-GCM encryption
AGENT_BROWSER_DEFAULT_TIMEOUT: Default Playwright timeout (ms, default 25000)
AGENT_BROWSER_PROVIDER: Cloud browser provider (browserbase/browseruse/kernel/ios)
AGENT_BROWSER_CONTENT_BOUNDARIES: Wrap page output with boundary markers
AGENT_BROWSER_MAX_OUTPUT: Max characters for page output
AGENT_BROWSER_ALLOWED_DOMAINS: List of allowed domain patterns

Integration Targets#

AI Tools: Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini
Cloud Services: Browserbase, Browser Use, Kernel, AWS Lambda/Vercel Serverless