Developer

browser hooks

Agent-facing browser control: page reading, navigation, typed input, raw eval.

Browser Hooks

Agent-facing API for the in-app CEF browser. 20 hooks across 3 permission scopes.

Quick start

// 1. Find a browser.
const { browsers } = await rp.browser.list({});
const { id: browserId } = browsers[0];

// 2. Read the page.
const { markdown } = await rp.browser.snapshot({ browserId });
// [a1 h1] Pricing
// [a2 button] Start free trial
// [a3 input email] placeholder="work email"

// 3. Act.
await rp.browser.fill({ browserId, elementId: "a3", value: "[email protected]" });
await rp.browser.click({ browserId, elementId: "a2", returnSnapshot: true });

Reading

browser.snapshot

Capture a tagged, agent-friendly view of the page. Visible interactive and textual elements get a short synthetic ID (e.g. a3) written to the DOM as data-rp-id="a3" so input hooks can address them.

Params: { browserId, includeHidden?, maxElements?, viewportOnly? } Returns: { markdown, url, title, elementCount, truncated } Permission: browser:read

Format:

  • [id tag role?] text for most elements
  • [id input type] placeholder="..." value="..." for form inputs
  • [id link href] text for anchors
  • [id iframe] src=... for nested frames (not traversed in v1)

browser.axTree

Return the full accessibility tree via CDP Accessibility.getFullAXTree. Use when snapshot misses hidden-by-mistake elements.

Params: { browserId } Returns: { nodes } - raw AX tree nodes Permission: browser:read

browser.screenshot

Capture a PNG of the browser as a base64 data URL.

Params: { browserId, fullPage?, quality? } Returns: { dataUrl, width, height } Permission: browser:read

browser.navigate

Navigate to URL and wait for the load condition. waitUntil: "load" | "domcontentloaded" | "networkidle". Params: { browserId, url, waitUntil?, timeoutMs? } | Returns: { url, title } | browser:write

browser.goBack / browser.goForward / browser.reload

Navigate history or reload; wait for load. Params: { browserId, waitUntil?, timeoutMs? } | Returns: { url, title } | browser:write

browser.waitForLoad

Wait until the browser reaches the given ready state. Params: { browserId, waitUntil?, timeoutMs? } | Returns: { url, title } | browser:read

browser.waitForElement

Wait until an element matching a CSS selector appears (and is visible, optionally). Params: { browserId, selector, visible?, timeoutMs? } | Returns: { elementId } | browser:read

Input

Every input hook supports returnSnapshot?: boolean - when true, a fresh snapshot is appended to the response.

browser.click

Click an element by elementId or raw (x, y). Passing both is InvalidParams. Params: { browserId, elementId?, x?, y?, button?, clickCount?, modifiers?, returnSnapshot? } | Returns: { clicked, coords, snapshot? } | browser:write

browser.type

Type text into an element or the focused control. Per-character key events. Params: { browserId, elementId?, text, clearFirst?, returnSnapshot? } | Returns: { ok, snapshot? } | browser:write

browser.fill

Set an input/textarea/select’s value and dispatch input+change events. React-safe via native setter. Params: { browserId, elementId, value, returnSnapshot? } | Returns: { ok, snapshot? } | browser:write

Use fill for React/Vue form fields; use type for terminals / search-as-you-type.

browser.pressKey

Press a key combination. DOM KeyboardEvent names (Enter, Escape, ArrowDown, a, …). Modifiers: any subset of ["ctrl", "shift", "alt", "meta"]. Params: { browserId, elementId?, key, modifiers?, returnSnapshot? } | Returns: { ok, snapshot? } | browser:write

browser.scroll

Scroll the page or an element by delta, or to top/bottom. Params: { browserId, elementId?, deltaX?, deltaY?, toTop?, toBottom?, returnSnapshot? } | Returns: { ok, snapshot? } | browser:write

browser.hover

Move cursor over an element without clicking (triggers hover state). Params: { browserId, elementId, durationMs?, returnSnapshot? } | Returns: { ok, snapshot? } | browser:write

browser.focus

Focus an element without clicking (useful before type when clicking would fire side effects). Params: { browserId, elementId, returnSnapshot? } | Returns: { ok, snapshot? } | browser:write

Discovery

browser.list

List browsers in the caller’s scope. No filter = caller’s active place only. Params: { placeId?, projectId?, resourceId? } | Returns: { browsers: [{ id, url, title, placeId?, resourceId?, visible, focused }] } | browser:read

browser.getUrl / browser.getTitle

Cheap cached queries. Params: { browserId } | Returns: { url } / { title } | browser:read

Raw eval

browser.eval

Run a JavaScript expression in the browser’s top frame and return its JSON-serializable result. High-trust: requires browser:eval scope + per-origin user consent on first use.

Params: { browserId, script, timeoutMs? } (default 5000ms) Returns: { value } Permission: browser:eval

Reaches cookies, localStorage, page DOM - treat as full-page access to that origin.

Permission scopes

ScopeHooks
browser:readlist, snapshot, axTree, screenshot, getUrl, getTitle, waitForLoad, waitForElement
browser:writenavigate, goBack, goForward, reload, click, type, fill, pressKey, scroll, hover, focus
browser:evaleval (isolated - rarely needed)

Errors

ErrorWhen
BrowserNotFoundbrowserId doesn’t resolve to a live browser
ElementNotFound[data-rp-id="X"] not in DOM - re-snapshot and retry
ElementNotVisibleElement is offscreen / zero-size
ElementDisabledInput is disabled or pointer-events: none
EvalTimeoutEval exceeded its timeout
NavigateTimeoutNavigation didn’t finish in timeoutMs
InvalidParamsRequired arg missing or mutually-exclusive args both set
EvalConsentDeniedUser rejected the first-call eval prompt
CapabilityDeniedCaller manifest lacks the required scope

Manifest

{
  "capabilities": ["browser:read", "browser:write"]
}

Applets that need browser.eval must also declare "browser:eval" - users see a high-trust warning at install time and are prompted per-origin on first use.