Browse: a faster alternative to Playwright MCP for AI agents

If you’ve used Playwright over MCP with an AI agent, you know the pain. Every action — click a button, take a screenshot, read the page — round-trips through JSON-RPC, spins up browser context, serializes the result back. It works. But for an agent clicking through a web app doing QA or verification, that overhead is brutal. Commands that should feel instant take seconds.

I built browse to fix this. It’s a CLI tool that keeps a single Chromium instance alive behind a Unix socket, so after one cold start, every subsequent command completes in under 30ms.

The architecture

The design is simple. Three layers:

plaintext

CLI ──JSON──> Unix socket ──> Daemon ──> Playwright ──> Chromium

The CLI is a thin client. It serializes your command as JSON and sends it to a Unix socket at /tmp/browse-daemon.sock. The daemon holds a persistent Chromium instance managed by Playwright under the hood. It auto-spawns on first use and shuts itself down after 30 minutes of inactivity.

The trick is that the browser never dies between commands. Playwright MCP starts fresh context for every interaction. Browse keeps the browser warm. That’s the whole insight, and it makes a massive difference:

Command	p50	p95
goto	27ms	32ms
snapshot	1ms	11ms
screenshot	24ms	25ms
click	17ms	18ms
fill	1ms	26ms

These are real numbers. Sub-30ms for navigation. Single-digit milliseconds for snapshots. For an agent running dozens of browser actions in sequence, this compounds.

Element refs: no selectors needed

Here’s the part I’m most pleased with. When you run browse snapshot, it returns an accessibility-tree representation of the page with every interactive element tagged with a ref like @e1, @e2, @e3:

bash

browse goto "https://example.com/login"
browse snapshot

The snapshot output gives you something like:

plaintext

[page] Login - Example App
  [heading] Sign In
  [@e1 input] Email address
  [@e2 input] Password
  [@e3 button] Sign In
  [@e4 link] Forgot password?

Now you can target elements directly:

bash

browse fill @e1 "user@example.com"
browse fill @e2 "hunter2"
browse click @e3

No CSS selectors. No XPath. No fragile DOM queries. The refs come from the accessibility tree, so they map to what a user actually sees and interacts with. For AI agents, this is perfect — an LLM can read the snapshot output, understand the page structure, and issue commands against stable references.

This matters more than it might seem. One of the biggest failure modes with Playwright MCP is agents constructing brittle selectors that break when the page re-renders or when class names are hashed. Element refs sidestep that entirely.

A real workflow

Here’s what a typical agent-driven flow looks like. Say you want to verify that a login form works:

bash

# Navigate and inspect
browse goto "https://staging.myapp.com/login"
browse snapshot

# Fill and submit
browse fill @e1 "test@example.com"
browse fill @e2 "password123"
browse click @e3

# Wait for navigation and verify
browse wait url "**/dashboard**"
browse assert text "Welcome back"
browse screenshot verify-login.png

Every command is stateless from the CLI’s perspective — fire and forget. The daemon maintains all the browser state. An agent can chain these together without managing sessions, contexts, or browser lifecycles.

Compare this to the equivalent Playwright MCP flow where each step involves a full MCP tool call, JSON-RPC serialization, and often a base64-encoded screenshot response. Browse gives you text-based snapshots by default, which are cheaper to process and faster for an LLM to reason about.

Power features worth knowing about

Beyond the basics, browse has a few features that make it genuinely useful for production workflows:

Session management. Export and import cookies and localStorage. You can configure login environments in a browse.config.json so your agent doesn’t need to re-authenticate every time:

json

{
  "environments": {
    "staging": {
      "loginUrl": "https://staging.myapp.com/login",
      "fields": {
        "email": { "selector": "#email", "envVar": "STAGING_EMAIL" },
        "password": { "selector": "#password", "envVar": "STAGING_PASSWORD" }
      },
      "successUrl": "**/dashboard**"
    }
  }
}

Then just browse login --env staging.

Reusable flows. Define automation sequences with variable substitution in your config. Good for repetitive checks an agent runs often.

Accessibility audits. Built-in WCAG 2.0/2.1 auditing via axe-core with browse a11y. Useful for agents doing QA.

Device presets. Switch between mobile, tablet, and desktop viewports and user agents without configuration.

Stealth mode. Added in v0.6.0 — bypasses Cloudflare Turnstile and common bot detection. Handy when your staging environment sits behind a CDN.

Wait conditions. Wait for URL patterns, text appearance, element visibility, network idle, or a manual delay. The browse wait command is flexible enough that agents rarely need to poll.

Getting started

You’ll need Bun >= 1.0. Install with:

bash

curl -fsSL https://raw.githubusercontent.com/forjd/browse/main/install.sh | bash

Or clone and build manually:

bash

git clone https://github.com/forjd/browse.git
cd browse
./setup.sh

Setup downloads Chromium, compiles a self-contained binary, and symlinks it to ~/.local/bin/browse.

For Claude Code integration:

bash

bunx skills add forjd/browse

Where this is heading

There’s a broader pattern here. MCP is great as a protocol, but wrapping heavyweight tools like Playwright behind it introduces latency that compounds fast in agentic workflows. The alternative — purpose-built CLI tools with persistent daemons — turns out to be a much better fit. The CLI is the universal interface. Every agent framework can shell out to it. No SDK, no protocol overhead, just stdin/stdout and fast IPC.

Browse is MIT licensed and on GitHub. Give it a try and let me know what breaks.