gsd-browser mcp starts a Model Context Protocol server that exposes the entire daemon surface — navigation, interaction, snapshots, recordings, vault, network control, and more — as over 50 discoverable tools, live resources, and executable prompts. Any MCP-compatible client connects to it with a single configuration block and immediately gains access to the full browser automation platform.
Start the MCP server
- Local stdio (recommended)
- HTTP server (remote / cloud)
- OpenGSD cloud tokens
Most MCP clients manage the server process for you. Point your client at The server communicates over stdin/stdout using the JSON-RPC MCP protocol. The daemon starts automatically when the first tool call arrives.
gsd-browser mcp and it handles startup, shutdown, and restarts automatically.Tool categories
The MCP server exposes 50+ tools grouped into logical categories. Calltools/list from any connected client to see the current full surface.
Navigation & page state
Navigation & page state
Snapshot & versioned refs
Snapshot & versioned refs
browser_snapshot, browser_get_ref — scan the page and assign versioned refs (@v1:e1), then inspect individual refs for bounding boxes, ARIA data, and structural signatures. The primary mechanism for reliable interaction. See Snapshots & Refs.Interaction
Interaction
browser_click_ref, browser_fill_ref, browser_hover_ref, browser_click, browser_type, browser_press, browser_scroll, browser_drag, browser_select_option, browser_set_checked, browser_upload_file, browser_set_viewport — precise element interaction using refs or CSS selectors. browser_click_ref accepts an optional double_click: true to dispatch a DOM dblclick instead of a single click, which some inline-edit and grid components require.Semantic & intent-based tools
Semantic & intent-based tools
browser_act, browser_act_instruction, browser_find_best — natural language intent execution. browser_act covers 15 built-in patterns (fill email, fill password, submit form, accept cookies, click next, dismiss dialog, open menu, and more). browser_act_instruction accepts a free-form instruction like "click Continue" or "enter alice@example.com into Email" and plans concrete primitive steps against the live page — use it when the intent isn’t a built-in pattern. Both tools share the self-healing action cache. See Free-form instructions with browser_act_instruction.Forms
Forms
browser_analyze_form, browser_fill_form — inspect a form’s structure and fill multiple fields in one call using labels, name attributes, or ARIA identifiers.Capture & visual output
Capture & visual output
browser_screenshot, browser_zoom_region, browser_save_pdf, browser_visual_diff — capture screenshots, zoom into regions, export PDFs, and run visual regression comparisons against a stored baseline.Live viewer & human collaboration
Live viewer & human collaboration
browser_view, browser_goal, browser_takeover, browser_release_control, browser_annotation_request, browser_step, browser_abort, browser_pause, browser_resume, browser_sensitive_on, browser_sensitive_off — open the authenticated viewer, set goal banners, let a human take over and annotate, then hand control back to the agent.Recording & evidence bundles
Recording & evidence bundles
browser_record_start, browser_record_stop, browser_recordings, browser_recording_export, browser_recording_validate, browser_generate_replayable_test — capture flows as rich, redacted evidence bundles and auto-convert them to commit-ready Playwright regression tests.Session management
Session management
browser_session_list, browser_session_new, browser_session_close, browser_session_save, browser_session_restore — manage isolated browser contexts. See Sessions.Network control
Network control
browser_mock_route, browser_block_urls, browser_clear_routes, browser_har_export, browser_trace_start, browser_trace_stop — intercept and mock requests, block URLs, export HAR files, and start CDP traces.Auth vault & state persistence
Auth vault & state persistence
browser_vault_save, browser_vault_login, browser_vault_list, browser_save_state, browser_restore_state — store encrypted credentials and persist full browser state across sessions for repeatable authenticated flows.Diagnostics & debugging
Diagnostics & debugging
browser_console, browser_network, browser_timeline, browser_debug_bundle, browser_session_summary, browser_check_injection, browser_evaluate (alias: browser_eval) — inspect console logs, network traffic, the action timeline, and get a full debug bundle (screenshot + console + network + a11y) when an agent gets stuck. browser_evaluate runs arbitrary JavaScript in the active page and safely returns the result; browser_eval is an identical short alias for agents that prefer the shorter name.Multi-tab & frame management
Multi-tab & frame management
browser_list_pages, browser_switch_page, browser_close_page, browser_list_frames, browser_select_frame — manage multiple tabs opened by navigation or JavaScript, and work inside iframes.Batch execution
Batch execution
browser_batch — run a sequence of actions atomically in a single round-trip. Highly recommended for complex multi-step flows where partial state errors must be avoided. Supported step actions include navigate, reload, click, type, select_option, key_press, press, wait_for, assert, click_ref (supports double_click: true), fill_ref, hover, hover_ref, scroll, snapshot, diff, and eval.Action cache
Action cache
browser_action_cache (stats / get / put / clear) — inspect, populate, and manage the self-healing intent-to-selector cache. See Snapshots & Refs.Reloading the current page with browser_reload
browser_reload exposes the daemon’s native page reload as an MCP tool. Use it to refresh dynamic content (long-polled dashboards, “load more” lists that need to restart from a clean state) or to recover from a stale page after an error. It returns the same structured page state as browser_navigate, so agents can branch on the response in the same way.
Reload only takes an optional session argument:
browser_reload with browser_snapshot before interacting with elements — refs from the previous page version are no longer valid.
Inside browser_batch, use the reload step instead of a separate tool call so the reload stays in the same atomic round-trip:
Free-form instructions with browser_act_instruction
browser_act_instruction accepts a short natural-language instruction, plans concrete primitive steps against the live DOM, and dispatches them through the same engine that powers browser_click, browser_type, browser_select_option, browser_set_checked, browser_drag, and browser_scroll. Reach for it when the action isn’t one of the built-in browser_act semantic patterns — for example “choose California from State”, “drag the price slider to the right”, or “scroll the comments panel down”.
The tool accepts:
| Field | Type | Description |
|---|---|---|
instruction | string (required) | Short action-oriented instruction, e.g. "click Continue", "enter 'alice@example.com' into Email", "choose California from State". |
dry_run | boolean | When true, return the planned steps and confidence without executing them. Defaults to false. |
scope | string | CSS selector that constrains planning to a form, dialog, panel, or other page region. Use this when a page has repeated controls (e.g. multiple “Save” buttons). |
min_confidence | number | Block execution when the inferred plan confidence is below this threshold. Use this to fail closed on ambiguous instructions instead of guessing. |
max_steps | integer | Cap on primitive steps the instruction may execute. Defaults to a small bounded sequence. |
session | string | Named session for parallel browser instances. |
dry_run: true:
click, type, select_option, …), the target element, and a confidence score. If the plan looks correct, re-issue the call without dry_run to execute it. If confidence is low or the target is wrong, tighten scope or rewrite the instruction.
Resources
Resources give your agent live context without issuing a full tool call. Read them in your agent loop after navigation to get up-to-date page state cheaply.| Resource URI | What it returns |
|---|---|
gsd-browser://latest-snapshot | Triggers a fresh snapshot and returns versioned refs + page structure |
gsd-browser://current-state | Full debug bundle: screenshot, console, network, timeline, a11y |
gsd-browser://active-recordings | List of in-progress recording bundles |
gsd-browser://timeline | Recent action timeline |
gsd-browser://current-refs | The refs from the most recent snapshot, without re-scanning |
Executable prompts
Built-in prompts are multi-step executable workflows that encode the agent best practices directly. Ask your MCP client to run them by name.| Prompt | What it does |
|---|---|
robust_login_flow | Navigates to a login page, fills credentials, submits, asserts the logged-in state, and saves the session |
full_page_audit | Runs snapshot, console, network, visual diff, and debug bundle in parallel and synthesizes the results |
create_evidence_bundle | Records a flow with annotations and exports a redacted, replayable bundle |
evidence_creation_workflow | Full record → export → generate Playwright test pipeline |
autonomous_research_task | Open-ended research flow with structured extraction and evidence capture |
debug_stuck_agent_flow | Collects debug bundle, console, network, and timeline to diagnose a stuck agent |
Response envelopes
Everytools/call response from the MCP server includes a standardized envelope:
suggested_next_actions on every call — they contain high-signal hints that significantly reduce the number of round-trips an agent needs to complete a task.
Client configuration
- Claude Desktop
- Cursor / VS Code
- Remote / cloud
Add the following to your Claude Desktop MCP configuration file (To pre-configure the browser path and vault key:
.mcp.json or claude_desktop_config.json):How the MCP adapter works
The MCP server is a thin, high-fidelity adapter over the same daemon client that the CLI uses. When your agent calls a tool, the server translates the JSON-RPC request directly to the daemon’s internal API, attaches the standardized response envelope, and returns the result. You get automatic daemon lifecycle management, named session routing, and all the reliability guarantees of the CLI — plus the discoverability and structured envelopes that MCP provides. Daemons started on behalf of an MCP client follow the same idle-shutdown and version-mismatch auto-restart rules as the CLI. TuneGSD_BROWSER_IDLE_SHUTDOWN_SECONDS in your MCP client’s env block to control how long an inactive agent session holds a Chrome process open. See Daemon lifecycle and idle shutdown.
