Session MCP Server
The Session MCP Server exposes the Automation API over the Model Context Protocol (MCP). Connect any MCP-compatible AI agent to a live proxied browser session — your agent can open URLs, read the page, fill in forms, click buttons, select text, and take screenshots, all without modifying the underlying website.
Use Cases
Section titled “Use Cases”- Third-party agents — connect Cognigy, ElevenLabs, or similar platforms to a live browser session without modifying the underlying website.
- Agent orchestration — use MCP-capable orchestrators (LangGraph, AutoGen, CrewAI) to build agentic workflows on top of any proxied web page.
- Agent development — iterate on prompts and tool calls interactively against a live session from Cursor, VS Code, or any MCP client.
Configuration
Section titled “Configuration”-
Create a Space
Section titled “Create a Space”Create a Space in Webfuse Studio . See the Quickstart for a step-by-step guide.
-
Generate a Space REST API Key
Section titled “Generate a Space REST API Key”In Webfuse Studio , open the newly created Space, navigate to Settings → API Keys, and generate a new Space REST API key (prefixed
rk_). Keep it secret — it grants full access to the Space. -
Enable the Automation App
Section titled “Enable the Automation App”Open a Session, toggle the Session Editor bar, and open the Apps tab. Find the Automation app and install it. See Apps for more details.
-
Restart the Session
Section titled “Restart the Session”The Automation app takes effect after a session restart. Close the current Session and start a new one.
Connect your MCP Client
Section titled “Connect your MCP Client”Configure your MCP client to connect to the Session MCP Server endpoint for your domain:
https://session-mcp.HOSTNAME/mcpAuthenticate with the Space REST API key as a Bearer token:
Authorization: Bearer <your-space-rest-api-key>Or add manually to .vscode/mcp.json in your workspace, or to user settings:
{ "servers": { "webfuse-session": { "type": "http", "url": "https://session-mcp.HOSTNAME/mcp", "headers": { "Authorization": "Bearer ${input:webfuse_api_key}" } } }, "inputs": [ { "type": "promptString", "id": "webfuse_api_key", "description": "Webfuse Space REST API Key", "password": true } ]}
Or add manually to .cursor/mcp.json:
{ "mcpServers": { "webfuse-session": { "type": "http", "url": "https://session-mcp.HOSTNAME/mcp", "headers": { "Authorization": "Bearer <your-space-rest-api-key>" } } }}Add to claude_desktop_config.json:
{ "mcpServers": { "webfuse-session": { "type": "http", "url": "https://session-mcp.HOSTNAME/mcp", "headers": { "Authorization": "Bearer <your-space-rest-api-key>" } } }}# To install the required library: pip install mcp
import asynciofrom mcp import ClientSessionfrom mcp.client.streamable_http import streamablehttp_client
async def main(): async with streamablehttp_client( "https://session-mcp.HOSTNAME/mcp", headers={"Authorization": "Bearer <your-space-rest-api-key>"}, ) as (read, write, _): async with ClientSession(read, write) as session: await session.initialize() result = await session.call_tool( "see_domSnapshot", {"session_id": "<session-id>"} )
print(result)
if __name__ == "__main__": asyncio.run(main())Try it
Section titled “Try it”Start a fresh session, then ask your agent:
“In Webfuse session, open https://webfuse.com and describe what you see.”
The agent will ask for a session ID, then use navigate to load the page and see_domSnapshot or see_guiSnapshot to read it.
From there you can ask it to interact with the page in natural language — click a link, fill in a form, or summarise the content.
Limits
Section titled “Limits”| Limit | Value | Notes |
|---|---|---|
| Tool call timeout | 15 s | Maximum time allowed for a single tool call, including network transfer in both directions and tool execution in the browser. If the round-trip doesn’t complete within this window the MCP server returns a timeout error to the agent. |
| Tool call input size | 16 KiB | Maximum decompressed size of the tool-call arguments sent to the server. |
| Tool call response size | 10 MiB | Maximum decompressed size of a tool result. Larger responses are rejected and the agent receives an error instead. |
| MCP Client connection duration | 3 min | MCP client connections are automatically closed after 3 minutes. This is a hard limit — clients must reconnect after this period to continue making tool calls. |
All tools require a session_id identifying the target session.
Finding your session ID: It appears in the session URL as the path segment after the hostname, for example:
https://HOSTNAME/sGpUNaFXihCSxCUfb3zezgaCwFor programmatic access, you can also retrieve it from a REST API response or a webhook payload.
Most actuation tools also accept a target — a CSS selector, Webfuse ID, or [x,y] coordinates.
act_click
Section titled “act_click”Click a target element with the specified mouse button. Use for buttons, links, checkboxes, and any other interactive element. Defaults to a left-button click.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target (CSS selector or Webfuse ID or coordinates in format ‘[x,y]‘) |
options.button | string | Mouse button to use: ‘left’ (default), ‘middle’, or ‘right’. | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before clicking. When false, the click is sent directly without moving the pointer (default: true). | |
options.scrollIntoView | boolean | Scroll the target element into the viewport before clicking. Disable only if the element is already visible (default: true). |
act_keyPress
Section titled “act_keyPress”Press a single key on a target element, with optional modifier keys. Key events are dispatched directly to the page, not to the operating system. OS-level shortcuts such as Ctrl+C (copy) or Ctrl+V (paste) will NOT work unless the page has explicitly implemented them. Standard editing keys (Enter, Backspace, Delete) and page-handled shortcuts work as expected.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target (CSS selector or Webfuse ID(e.g. ‘234-56’) or coordinates in format ‘[x,y]‘) |
key | string | ✓ | Key to press using the KeyboardEvent.key name (e.g. ‘Enter’, ‘ArrowUp’, ‘a’, ‘B’, ‘F5’). |
options.altKey | boolean | Hold the Alt key while pressing the key. Only effective if the page handles the resulting combination (default: false). | |
options.ctrlKey | boolean | Hold the Control key while pressing the key. Only effective if the page handles the resulting combination — OS shortcuts like Ctrl+C will not work (default: false). | |
options.metaKey | boolean | Hold the Meta (Cmd on macOS, Win on Windows) key while pressing the key (default: false). | |
options.shiftKey | boolean | Hold the Shift key while pressing the key (default: false). | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before pressing the key. When false, the action is sent directly without moving the pointer (default: true). | |
options.scrollIntoView | boolean | Scroll the target element into the viewport before pressing the key. Disable only if the element is already visible (default: true). |
act_mouseMove
Section titled “act_mouseMove”Move the virtual mouse pointer to a target element or coordinates without clicking. Use to trigger hover states, tooltips, or drop-down menus that require mouse proximity. Can optionally keep the pointer visible on screen after the move.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target (CSS selector or Webfuse ID (e.g. ‘234-56’) or coordinates in format ‘[x,y]‘) |
options.persistent | boolean | Keep the pointer visible on screen indefinitely after the move. When false (default), the pointer fades out automatically after a short delay. |
act_scroll
Section titled “act_scroll”Scroll a target element or the page by a given number of pixels. Use to bring off-screen content into view or to navigate long pages. Positive amounts scroll down or right; negative amounts scroll up or left. When scrolling the full page rather than a specific element, use ‘html’ as the target - ‘body’ often does not respond to scrolling.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target (CSS selector or Webfuse ID(e.g. ‘234-56’) or coordinates in format ‘[x,y]‘) |
amount | number | ✓ | Number of pixels to scroll. Positive scrolls down or right; negative scrolls up or left. |
options.direction | string | Axis to scroll along: ‘vertical’ (up/down, default) or ‘horizontal’ (left/right). |
act_select
Section titled “act_select”Select an option in a <select> dropdown element by matching its value attribute. Use this instead of act_click when interacting with native HTML dropdowns.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target (CSS selector or Webfuse ID(e.g. ‘234-56’) or coordinates in format ‘[x,y]‘) |
value | string | ✓ | The value attribute of the option to select, not the visible display text. |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true). | |
options.scrollIntoView | boolean | Scroll the target element into the viewport before selecting. Disable only if the element is already visible (default: true). |
act_textSelect
Section titled “act_textSelect”Select a continuous run of text within a target element by matching its content. Use to highlight text before copying, replacing, or applying formatting. Pass an empty string to clear any existing selection.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target (CSS selector or Webfuse ID(e.g. ‘234-56’) or coordinates in format ‘[x,y]‘) |
text | string | ✓ | Text to select by matching its content. Pass an empty string to clear any existing selection. |
options.occurrence | number | Which occurrence to select when the text appears more than once in the element. 0 selects the first match (default: 0). | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true). | |
options.scrollIntoView | boolean | Scroll the target element into the viewport before selecting. Disable only if the element is already visible (default: true). |
act_type
Section titled “act_type”Type text into a target input element, simulating natural key-by-key human input. Use for text fields, search boxes, and any editable element. By default, overwrites existing content.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target (CSS selector or Webfuse ID(e.g. ‘234-56’) or coordinates in format ‘[x,y]‘) |
text | string | ✓ | Text to type into the target element. |
options.followFocus | boolean | Continue typing into whichever element holds focus, even if focus moved away from the original target. Set to false to strictly type into the target (default: true). | |
options.overwrite | boolean | Replace the existing content of the input before typing. Set to false to append or insert at the current cursor position (default: true). | |
options.timePerChar | number | Mean delay between key presses in milliseconds, controlling typing speed. Lower values type faster (default: 100). | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before typing. When false, the action is sent directly without moving the pointer (default: true). | |
options.scrollIntoView | boolean | Scroll the target element into the viewport before typing. Disable only if the element is already visible (default: true). |
navigate
Section titled “navigate”Navigate the current browser tab to a new URL. Use to open a page before interacting with it. After navigation completes, take a snapshot to confirm the page loaded as expected.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
url | string | ✓ | URL to navigate to. Supports absolute URLs (e.g. ‘https://example.com/page’) and relative URLs (e.g. ‘/page’), which are resolved against the current tab’s URL. |
see_accessibilityTree
Section titled “see_accessibilityTree”Capture the accessibility tree of the current page as a structured JSON object. Use to understand page semantics — roles, names, ARIA states (checked, expanded, disabled, …) — without parsing raw HTML. Each node’s ‘source’ field is a CSS selector that can be passed directly to actuation tools for targeting. Prefer see_domSnapshot when you need Webfuse ID (wf-id) targeting or want the full HTML structure.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
options.root | string | CSS selector scoping the accessibility tree to a specific subtree instead of the full page. Use to reduce output size when the area of interest is known (default: body). | |
options.quality | number | Snapshot completeness as a float between 0 (lowest) and 1 (highest, default). Values below 1 downsample the underlying DOM before computing the tree, reducing output size at the cost of some fidelity. |
see_domSnapshot
Section titled “see_domSnapshot”Capture a structured text representation of the current page’s DOM. Use to read element text, attributes, and hierarchy before deciding which element to interact with. Prefer this over see_guiSnapshot when you need precise element targeting or the page is mostly text-based.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
options.crossFrame | boolean | Include content inside <iframe> elements. Enable when the target element is inside a frame (default: false). | |
options.crossShadow | boolean | Include content inside shadow DOM roots. Disable only if shadow DOM content is not needed (default: true). | |
options.interactiveOnly | boolean | Omit non-interactive elements and return only buttons, inputs, links, and similar controls. Reduces snapshot size when you only need to find actionable elements (default: false). | |
options.quality | number | DOM snapshot completeness as a float between 0 (lowest) and 1 (highest, default). At 1, the DOM is returned as-is — all elements present with full structure and context. Below 1, the snapshot is downsampled after capture: output is smaller but the DOM is structurally altered — elements may be merged, reordered, or dropped, causing loss of context and unreliable element targeting. Exception: if webfuseIDs=true, wf-id attributes survive downsampling and element targeting remains precise. Always prefer 1; only lower if the snapshot is too large to process and you accept degraded accuracy. | |
options.revealMaskedElements | boolean | Include elements that have been masked by the Webfuse Masking App. Masked elements are hidden from the snapshot by default to protect sensitive content. Enable only when you explicitly need to interact with masked elements (default: false). | |
options.root | string | CSS selector scoping the snapshot to a specific subtree instead of the full page. Use to reduce snapshot size when the area of interest is known (default: body). | |
options.webfuseIDs | boolean | Annotate each element with a truly unique wf-id attribute. Especially useful when targeting elements inside iframes or shadow DOM (where CSS selectors may not reach), or when multiple elements share the same id attribute. Pass the wf-id value as the target to other tools for unambiguous element targeting (default: false). |
see_guiSnapshot
Section titled “see_guiSnapshot”Capture a screenshot of the current page as an image. Use when the page is visually complex, contains images, charts, or canvas elements that cannot be understood from the DOM alone. Prefer see_domSnapshot when you need precise element targeting or text extraction.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
quality | number | Image compression level as a float between 0 (lowest quality, smallest size) and 1 (highest quality, largest size). Lower values reduce image detail. Default is 0.6. |
see_textSelection
Section titled “see_textSelection”Read the text that is currently selected (highlighted) on the page. Use to verify the result of act_textSelect, or to capture text the user has already selected before acting on it.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
Pause execution for a fixed duration. Use sparingly — only when a page transition, animation, or async operation has no observable DOM signal to wait on. Prefer taking a snapshot to verify state rather than assuming a fixed delay is sufficient.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
durationMs | number | ✓ | Duration to wait in milliseconds. |
System Instructions
Section titled “System Instructions”These instructions are sent to the model automatically when it connects. They describe the available tools and guide the agent’s behaviour. Use this as a starting point and customise it for your use case.
You are an intelligent browser agent which helps users to perform various tasks on the web.You have access to a set of tools that allow you to interact with web pages, extract information, and perform actions.Use these tools to accomplish the user's goals effectively.
## Session- Every tool requires a "session_id" to identify the Webfuse session. Always include the correct "session_id". Ask the user to provide it if you don't have it.
## Observing the page- Prefer "see_domSnapshot" to read page content and identify elements before acting. It returns a structured text representation of the DOM that is precise and reliable for element targeting.- Use "see_guiSnapshot" only when the page is visually complex (images, charts, canvas) and the DOM snapshot is insufficient.- After navigating to a new URL or performing an action that changes the page, always take a fresh snapshot to confirm the result before proceeding.- If an element is not found in a DOM snapshot, re-take the snapshot with "crossFrame: true" and "crossShadow: true" to include content inside iframes and shadow DOM roots.
## Targeting elements- You can target elements by CSS selector, Webfuse ID (wf-id), or coordinates in the format [x,y].- Prefer Webfuse IDs — they are truly unique, pierce through iframes and shadow DOM, and remain reliable even when multiple elements share the same HTML id attribute.- To obtain wf-ids, take a DOM snapshot with "webfuseIDs: true". When re-taking a snapshot to find elements inside iframes or shadow DOM, always enable "webfuseIDs: true" at the same time.
## Performing actions- Before interacting with the page, check for and dismiss any overlays that may block interaction — such as cookie consent banners, GDPR notices, newsletter popups, or other modals. Close or accept them first, then proceed with the intended action.- "moveMouse" defaults to true in all action tools — do not explicitly disable it unless the element is already focused and mouse movement is undesirable.- For native HTML <select> dropdowns, use "act_select" (not "act_click"). Pass the option's value attribute, not its display text.- For keyboard shortcuts: key events are dispatched to the page, not the OS. Standard editing keys (Enter, Backspace, Delete) work as expected. OS-level shortcuts such as Ctrl+C or Ctrl+V will NOT work unless the page has explicitly implemented them.
## Waiting- Avoid using "wait" unless there is no observable DOM change to verify against. Always prefer taking a snapshot to confirm state over assuming a fixed delay is sufficient.
## Error handling- Tool results contain an "isError" field. If true, read the "content" field for error details, analyze the cause, and adjust your strategy before retrying.
If unsure — ask the user for clarification or additional information.