An agent-first browser automation CLI from Vercel Labs that enables efficient, low-token web interactions using a Ref-based system.
Project Positioning#
agent-browser is an agent-first headless browser automation CLI tool providing deterministic interaction and high context efficiency for AI agents via accessibility trees and a Ref system.
Core Problem#
Traditional browser automation tools (like Puppeteer/Playwright native interfaces) output large DOM trees (typically 3000-5000 tokens), and element selectors (CSS/XPath) are unstable in dynamic pages, making them unsuitable for AI agents (LLMs) to directly consume and operate.
Core Features#
- Agent-first design: Output format optimized for AI context efficiency, compact output ~200-400 tokens vs DOM's 3000-5000 tokens
- Ref-based element selection: Deterministic element positioning based on accessibility trees and references (e.g., @e1, @e2)
- Fast: Native Rust CLI achieves sub-millisecond command parsing
- JSON mode:
--jsonflag for machine-readable output
Tech Stack#
- TypeScript (58.7%)
- Rust (36.1%)
- Python (1.9%)
- JavaScript (1.7%)
- Dependency: Playwright (browser driver)
Supported Platforms#
- macOS (ARM64, x64)
- Linux (ARM64, x64)
- Windows (x64)
Core Capabilities#
Browser Automation#
- Navigation control:
open,back,forward,reload - Element interaction:
click,fill,type,hover,drag,upload - Keyboard control:
press,keydown,keyup,keyboard type - Form operations:
check,uncheck,select
Information Retrieval#
- Page snapshot:
snapshotcommand returns accessibility tree with refs - Text extraction:
get text,get html,get value,get attr - Page state:
get title,get url,get count,get box,get styles - Screenshot/PDF:
screenshot(supports full page and annotations),pdf
Advanced Features#
- Session management: Multiple isolated browser instances (
--session) - Persistent configuration: Config files and user data persistence (
--profile,--session-name) - Network control: Network request interception, routing, mock responses
- Storage management: cookies, localStorage, sessionStorage operations
- Multi-tab/window: Tab and window management
- CDP connection: Connect to external browsers via Chrome DevTools Protocol
Security Features (Optional)#
- Auth vault: Locally encrypted credential storage
- Content boundary markers: Mark page output to distinguish tool output from untrusted content
- Domain allowlist: Restrict navigation to trusted domains
- Action policies: Use policy files to control destructive operations
- Output length limits: Prevent context flooding attacks
Architecture#
Client-Daemon Architecture#
- Rust CLI: Fast native binary, parses commands, communicates with daemon
- Node.js Daemon: Manages Playwright browser instances
- Fallback: Directly uses Node.js if native binary unavailable
Browser Engine#
- Defaults to Chromium
- Supports Firefox and WebKit via Playwright protocol
- Supports custom browser executables
Ref System#
- Generates deterministic element references based on accessibility tree
- Avoids re-querying DOM, improves performance
- AI-friendly text output format
Typical Use Cases#
AI Agent Automation#
- Provides browser automation capabilities for Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini, etc.
- Suitable for any AI agent that can execute shell commands
Web Testing#
- Login flow testing
- Form filling validation
- Page interaction testing
- Cross-browser compatibility testing
Data Extraction#
- Web content scraping
- Structured data extraction
- Screenshot and PDF generation
Cloud Deployment#
- Supports serverless environments (Vercel, AWS Lambda)
- Integrates third-party browser services (Browserbase, Browser Use, Kernel)
Installation Methods#
Global Installation (Recommended)#
npm install -g agent-browser
agent-browser install # Download Chromium
Quick Try (No Installation)#
npx agent-browser open example.com
macOS Homebrew#
brew install agent-browser
agent-browser install # Download Chromium
Basic Workflow#
# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i
# 2. Interact using refs
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser screenshot page.png
# 3. Close browser
agent-browser close
AI Agent Mode#
# JSON output for machine parsing
agent-browser snapshot --json
# Compact interactive element list
agent-browser snapshot -i --json
Key CLI Flags#
--json: Output machine-readable JSON format--session <name>: Start isolated browser session--profile <path>: Specify persistent data directory--headed: Show browser GUI (headless by default)
Key Environment Variables#
AGENT_BROWSER_SESSION_NAME: Auto save/load session stateAGENT_BROWSER_ENCRYPTION_KEY: 64-char hex key for AES-256-GCM encryptionAGENT_BROWSER_DEFAULT_TIMEOUT: Default Playwright timeout (ms, default 25000)AGENT_BROWSER_PROVIDER: Cloud browser provider (browserbase/browseruse/kernel/ios)AGENT_BROWSER_CONTENT_BOUNDARIES: Wrap page output with boundary markersAGENT_BROWSER_MAX_OUTPUT: Max characters for page outputAGENT_BROWSER_ALLOWED_DOMAINS: List of allowed domain patterns
Integration Targets#
- AI Tools: Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini
- Cloud Services: Browserbase, Browser Use, Kernel, AWS Lambda/Vercel Serverless