Visual Inspection

How to let sandboxed coding agents inspect localhost UIs without giving them ambient browser access.

Last updated

Architecture
Reality check: this is still an experimental operator pattern. For the strongest isolation, pixel-based inspection in a dedicated VM or container is still better than trying to make a host-local browser helper perfectly safe.

1. Start with the cheapest primitive

Most agent visual tasks do not start with a screenshot. They start with the question: do you need pixels, or do you just need to know what the page says and which elements are present?

If the task is “is the login button visible,” “did the dashboard heading appear,” or “did the form submit,” use an accessibility tree first. It is cheaper, stabler, and easier to act on than pixels.

Default rule: accessibility tree first, screenshots only when layout, canvas, charts, color, spacing, or actual rendering fidelity matters.

This is why @playwright/mcp defaults to AX-tree style workflows. It keeps the browser problem smaller and the token cost lower.

2. When pixels are actually required

You need screenshots when the picture is the content or the layout is the bug.

  • canvas, WebGL, maps, and charts
  • CSS/layout regressions
  • font rendering and spacing differences
  • designer review loops
  • final visual confirmation of a localhost UI

The mistake is not taking screenshots. The mistake is giving a coding agent a general-purpose browser just to get one.

3. Why browser-in-sandbox fails on macOS

On macOS, modern browsers are multi-process IPC-heavy runtimes. Under a deny-by-default sandbox, WebKit, Chromium, and Firefox all tend to fail on the same family of problems:

  • Mach bootstrap registration and lookup
  • shared-memory setup for child processes
  • browser-owned runtime state that wants to spill outside your narrow allowlist

That means the theoretical fix is to widen the sandbox with the right Mach and shared-memory allowances. The practical problem is that this broadens the browser runtime inside the same trust boundary you were trying to keep tight.

So the question is not “can I make Playwright run under the sandbox.” The better question is “should I move browser execution into a narrower trusted helper instead.”

4. Preferred local pattern

The preferred local answer is a narrow localhost screenshot service, not ambient browser access inside the main agent sandbox.

agent in nono → localhost screenshot client → site-shotd on 127.0.0.1 → webkit/chromium → fixed shots dir

In this pattern:

  • the agent stays inside the normal sandbox
  • the service owns browser execution
  • the service owns the browser toolchain and runtime state
  • the service accepts only localhost-style URLs
  • the service writes only into a fixed screenshots directory
  • the agent gets only the localhost client and read access to the resulting screenshots
Good boundary: let the agent request a screenshot. Do not let it drive an arbitrary browser process with arbitrary flags and arbitrary output paths.

This keeps the browser as a small trusted subsystem instead of turning every coding profile into a browser profile. It is the best practical local compromise, not the strongest possible sandbox. A dedicated VM still wins on isolation.

5. WebKit first, Chromium when needed

For local screenshot helpers, start with WebKit. It has the smallest footprint and is the easiest first engine to stabilize.

Use Chromium as an explicit option when you actually need Chrome-like rendering parity. That is a useful capability, but it should be an opt-in inside the helper, not a reason to hand agents raw Chrome access.

If the helper itself is a user-managed script inside the sandbox boundary, invoke it through an already-allowed shell like bash instead of executing the script path directly.

bash /absolute/path/to/site-shot --browser webkit http://localhost:3000
bash /absolute/path/to/site-shot --browser chromium http://localhost:3000

The point is not to avoid Chromium forever. The point is to keep it inside a narrow service boundary.

6. What not to allow

These are the mistakes that make a screenshot helper turn back into ambient browser access:

  • arbitrary external URLs when the real need is localhost testing
  • arbitrary browser flags
  • arbitrary output paths
  • agent write access to the browser toolchain
  • agent write access to the helper's runtime state
  • hidden bypasses that quietly skip the sandbox model

If you need a bigger browser workflow than this, step up intentionally to MCP, Docker, or VM-based patterns. Do not quietly turn the screenshot helper into a general browser escape hatch.

7. Operator checklist

  • Use AX tree by default.
  • Use screenshots only when pixels matter.
  • Keep browser execution outside the main agent sandbox.
  • Bind the helper to 127.0.0.1 only.
  • Allow only localhost-style URLs.
  • Give agents read access to screenshots, not browser internals.
  • Make Chromium explicit, not ambient.

If you want the full stack context, go back to agent-stack. If you want the profile philosophy behind this split, read sandbox-profiles.