DISCOVER THE FUTURE OF AI AGENTSarrow_forward

agent-browser

calendar_todayAdded Feb 26, 2026
categoryAgent & Tooling
codeOpen Source
TypeScriptRustPlaywrightAI AgentsBrowser AutomationCLIAgent & ToolingDeveloper Tools & CodingAutomation, Workflow & RPA

An agent-first browser automation CLI from Vercel Labs that enables efficient, low-token web interactions using a Ref-based system.

Project Positioning#

agent-browser is an agent-first headless browser automation CLI tool providing deterministic interaction and high context efficiency for AI agents via accessibility trees and a Ref system.

Core Problem#

Traditional browser automation tools (like Puppeteer/Playwright native interfaces) output large DOM trees (typically 3000-5000 tokens), and element selectors (CSS/XPath) are unstable in dynamic pages, making them unsuitable for AI agents (LLMs) to directly consume and operate.

Core Features#

  • Agent-first design: Output format optimized for AI context efficiency, compact output ~200-400 tokens vs DOM's 3000-5000 tokens
  • Ref-based element selection: Deterministic element positioning based on accessibility trees and references (e.g., @e1, @e2)
  • Fast: Native Rust CLI achieves sub-millisecond command parsing
  • JSON mode: --json flag for machine-readable output

Tech Stack#

  • TypeScript (58.7%)
  • Rust (36.1%)
  • Python (1.9%)
  • JavaScript (1.7%)
  • Dependency: Playwright (browser driver)

Supported Platforms#

  • macOS (ARM64, x64)
  • Linux (ARM64, x64)
  • Windows (x64)

Core Capabilities#

Browser Automation#

  • Navigation control: open, back, forward, reload
  • Element interaction: click, fill, type, hover, drag, upload
  • Keyboard control: press, keydown, keyup, keyboard type
  • Form operations: check, uncheck, select

Information Retrieval#

  • Page snapshot: snapshot command returns accessibility tree with refs
  • Text extraction: get text, get html, get value, get attr
  • Page state: get title, get url, get count, get box, get styles
  • Screenshot/PDF: screenshot (supports full page and annotations), pdf

Advanced Features#

  • Session management: Multiple isolated browser instances (--session)
  • Persistent configuration: Config files and user data persistence (--profile, --session-name)
  • Network control: Network request interception, routing, mock responses
  • Storage management: cookies, localStorage, sessionStorage operations
  • Multi-tab/window: Tab and window management
  • CDP connection: Connect to external browsers via Chrome DevTools Protocol

Security Features (Optional)#

  • Auth vault: Locally encrypted credential storage
  • Content boundary markers: Mark page output to distinguish tool output from untrusted content
  • Domain allowlist: Restrict navigation to trusted domains
  • Action policies: Use policy files to control destructive operations
  • Output length limits: Prevent context flooding attacks

Architecture#

Client-Daemon Architecture#

  • Rust CLI: Fast native binary, parses commands, communicates with daemon
  • Node.js Daemon: Manages Playwright browser instances
  • Fallback: Directly uses Node.js if native binary unavailable

Browser Engine#

  • Defaults to Chromium
  • Supports Firefox and WebKit via Playwright protocol
  • Supports custom browser executables

Ref System#

  • Generates deterministic element references based on accessibility tree
  • Avoids re-querying DOM, improves performance
  • AI-friendly text output format

Typical Use Cases#

AI Agent Automation#

  • Provides browser automation capabilities for Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini, etc.
  • Suitable for any AI agent that can execute shell commands

Web Testing#

  • Login flow testing
  • Form filling validation
  • Page interaction testing
  • Cross-browser compatibility testing

Data Extraction#

  • Web content scraping
  • Structured data extraction
  • Screenshot and PDF generation

Cloud Deployment#

  • Supports serverless environments (Vercel, AWS Lambda)
  • Integrates third-party browser services (Browserbase, Browser Use, Kernel)

Installation Methods#

npm install -g agent-browser
agent-browser install  # Download Chromium

Quick Try (No Installation)#

npx agent-browser open example.com

macOS Homebrew#

brew install agent-browser
agent-browser install  # Download Chromium

Basic Workflow#

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i

# 2. Interact using refs
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser screenshot page.png

# 3. Close browser
agent-browser close

AI Agent Mode#

# JSON output for machine parsing
agent-browser snapshot --json

# Compact interactive element list
agent-browser snapshot -i --json

Key CLI Flags#

  • --json: Output machine-readable JSON format
  • --session <name>: Start isolated browser session
  • --profile <path>: Specify persistent data directory
  • --headed: Show browser GUI (headless by default)

Key Environment Variables#

  • AGENT_BROWSER_SESSION_NAME: Auto save/load session state
  • AGENT_BROWSER_ENCRYPTION_KEY: 64-char hex key for AES-256-GCM encryption
  • AGENT_BROWSER_DEFAULT_TIMEOUT: Default Playwright timeout (ms, default 25000)
  • AGENT_BROWSER_PROVIDER: Cloud browser provider (browserbase/browseruse/kernel/ios)
  • AGENT_BROWSER_CONTENT_BOUNDARIES: Wrap page output with boundary markers
  • AGENT_BROWSER_MAX_OUTPUT: Max characters for page output
  • AGENT_BROWSER_ALLOWED_DOMAINS: List of allowed domain patterns

Integration Targets#

  • AI Tools: Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini
  • Cloud Services: Browserbase, Browser Use, Kernel, AWS Lambda/Vercel Serverless

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch