Session MCP Server
The Session MCP Server exposes the Automation API over the Model Context Protocol (MCP). Connect any MCP-compatible AI agent to a live proxied browser session — your agent can open URLs, read the page, fill in forms, click buttons, select text, and take screenshots, all without modifying the underlying website.
Use Cases
Section titled “Use Cases”- Third-party agents — connect Cognigy, ElevenLabs, or similar platforms to a live browser session without modifying the underlying website.
- Agent orchestration — use MCP-capable orchestrators (LangGraph, AutoGen, CrewAI) to build agentic workflows on top of any proxied web page.
- Agent development — iterate on prompts and tool calls interactively against a live session from Cursor, VS Code, or any MCP client.
Configuration
Section titled “Configuration”-
Create a Space
Section titled “Create a Space”Create a Space in Webfuse Studio . See the Getting Started for a step-by-step guide.
-
Generate a Space Automation API Key
Section titled “Generate a Space Automation API Key”In Webfuse Studio , open the newly created Space, navigate to Settings → API Keys, and generate a new Space Automation API key (prefixed
ak_). This token grants full remote control over the session. Treat it as a secret - do not expose it in client-side code, logs, or URLs. -
Enable the Automation App
Section titled “Enable the Automation App”Open a Session, toggle the Session Editor bar, and open the Apps tab. Find the Automation app and install it. See Apps for more details.
-
Configure Available Tools (optional)
Section titled “Configure Available Tools (optional)”By default all automation tools are available. To restrict which tools agents can use, open the Automation app settings and click Configure tools. Uncheck any tools you want to disable for this Space — the change applies to all sessions in the Space. To disable automation entirely, uninstall the Automation app instead.
-
Restart the Session
Section titled “Restart the Session”The Automation app takes effect after a session restart. Close the current Session and start a new one.
Connect your MCP Client
Section titled “Connect your MCP Client”Configure your MCP client to connect to the Session MCP Server endpoint for your domain:
https://session-mcp.HOSTNAME/mcpAuthenticate with the Space Automation API key as a Bearer token:
Authorization: Bearer <your-space-automation-key>Dynamic tool discovery
Section titled “Dynamic tool discovery”By default, the server returns all tools on the first tools/list request and every tool requires a session_id parameter.
If you append ?dynamic=true to the endpoint URL, the server starts with only the connectToSession tool. Call it with a session_id to bind the connection to a session — the server then registers the full tool set and sends a notifications/tools/list_changed notification so the client can re-fetch the tool list. Calling connectToSession with a different session_id rebinds the connection without reconnecting.
https://session-mcp.HOSTNAME/mcp?dynamic=trueOr add manually to .vscode/mcp.json in your workspace, or to user settings:
{ "servers": { "webfuse-session": { "type": "http", "url": "https://session-mcp.HOSTNAME/mcp", "headers": { "Authorization": "Bearer ${input:automation_key}" } } }, "inputs": [ { "type": "promptString", "id": "automation_key", "description": "Space Automation API Key", "password": true } ]}
Or add manually to .cursor/mcp.json:
{ "mcpServers": { "webfuse-session": { "type": "http", "url": "https://session-mcp.HOSTNAME/mcp", "headers": { "Authorization": "Bearer <your-space-automation-key>" } } }}Add to claude_desktop_config.json:
{ "mcpServers": { "webfuse-session": { "type": "http", "url": "https://session-mcp.HOSTNAME/mcp", "headers": { "Authorization": "Bearer <your-space-automation-key>" } } }}Or use the Claude CLI:
claude mcp add-json webfuse-session '{"type":"http","url":"https://session-mcp.HOSTNAME/mcp","headers":{"Authorization":"Bearer <your-space-automation-key>"}}'# To install the required library: pip install mcp
import asyncioimport httpxfrom mcp import ClientSessionfrom mcp.client.streamable_http import streamable_http_client
async def main(): async with streamable_http_client( "https://session-mcp.HOSTNAME/mcp", http_client=httpx.AsyncClient( headers={"Authorization": "Bearer <your-space-automation-key>"}, timeout=httpx.Timeout(timeout=None, connect=10.0), ), ) as (read, write, _): async with ClientSession(read, write) as session: await session.initialize() result = await session.call_tool( "see.domSnapshot", {"session_id": "<session-id>"} )
print(result)
if __name__ == "__main__": asyncio.run(main())Try it
Section titled “Try it”Start a fresh session, then ask your agent:
“In Webfuse session, open https://webfuse.com and describe what you see.”
The agent will ask for a session ID, then use navigate to load the page and see.domSnapshot or see.guiSnapshot to read it.
From there you can ask it to interact with the page in natural language — click a link, fill in a form, or summarise the content.
Limits
Section titled “Limits”| Limit | Value | Notes |
|---|---|---|
| Tool call timeout | 15s | Maximum time allowed for a single tool call, including network transfer in both directions and tool execution in the browser. If the round-trip doesn’t complete within this window the MCP server returns a timeout error to the agent. |
| Tool call input size | 16KiB | Maximum decompressed size of the tool-call arguments sent to the server. |
| Tool call response size | 10MiB | Maximum decompressed size of a tool result. Larger responses are rejected and the agent receives an error instead. |
| MCP Client connection duration | 3min | MCP client connections are automatically closed after 3 minutes. This is a hard limit — clients must reconnect after this period to continue making tool calls. |
All tools require a session_id identifying the target session.
Finding your session ID: It appears in the session URL as the path segment after the hostname, for example:
https://HOSTNAME/sGpUNaFXihCSxCUfb3zezgaCwFor programmatic access, you can also retrieve it from a REST API response or a webhook payload.
Most actuation tools also accept a target — a CSS selector, Webfuse ID, or [x,y] coordinates.
Execution context: All commands are executed on the active tab of the session, on the tab owner’s browser. If the active tab or its tab owner is not present when a tool call arrives, the call fails with an error.
act.click
Section titled “act.click”Click a target element with the specified mouse button. Use for buttons, links, checkboxes, and any other interactive element. Defaults to a left-button click.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘ |
options.button | string | Mouse button to use: ‘left’ (default), ‘middle’, or ‘right’. | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before clicking. When false, the click is sent directly without moving the pointer (default: true). | |
options.waitForTarget | boolean | Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false). |
act.keyPress
Section titled “act.keyPress”Press a single key on a target element, with optional modifier keys. Key events are dispatched directly to the page, not to the operating system. OS-level shortcuts such as Ctrl+C (copy) or Ctrl+V (paste) will NOT work unless the page has explicitly implemented them. Standard editing keys (Enter, Backspace, Delete) and page-handled shortcuts work as expected.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘ |
key | string | ✓ | Key to press using the KeyboardEvent.key name (e.g. ‘Enter’, ‘ArrowUp’, ‘a’, ‘B’, ‘F5’). |
options.altKey | boolean | Hold the Alt key while pressing the key. Only effective if the page handles the resulting combination (default: false). | |
options.ctrlKey | boolean | Hold the Control key while pressing the key. Only effective if the page handles the resulting combination - OS shortcuts like Ctrl+C will not work (default: false). | |
options.metaKey | boolean | Hold the Meta (Cmd on macOS, Win on Windows) key while pressing the key (default: false). | |
options.shiftKey | boolean | Hold the Shift key while pressing the key (default: false). | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before pressing the key. When false, the action is sent directly without moving the pointer (default: true). | |
options.waitForTarget | boolean | Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false). |
act.mouseMove
Section titled “act.mouseMove”Move the virtual mouse pointer to a target element or coordinates without clicking. Use to trigger hover states, tooltips, or drop-down menus that require mouse proximity. Can optionally keep the pointer visible on screen after the move.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘ |
options.persistent | boolean | Keep the pointer visible on screen indefinitely after the move. When false (default), the pointer fades out automatically after a short delay. | |
options.waitForTarget | boolean | Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false). |
act.scroll
Section titled “act.scroll”Scroll a target element or the page by a given number of pixels. Use to bring off-screen content into view or to navigate long pages. Positive amounts scroll down or right; negative amounts scroll up or left. When scrolling the full page rather than a specific element, use ‘html’ as the target - ‘body’ often does not respond to scrolling.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘ |
amount | number | ✓ | Number of pixels to scroll. Positive scrolls down or right; negative scrolls up or left. |
options.direction | string | Axis to scroll along: ‘vertical’ (up/down, default) or ‘horizontal’ (left/right). | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before scrolling. When false, the action is sent directly without moving the pointer (default: true). | |
options.waitForTarget | boolean | Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false). |
act.select
Section titled “act.select”Select an option in a <select> dropdown element by matching its value attribute. Use this instead of act.click when interacting with native HTML dropdowns.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘ |
value | string | ✓ | The value attribute of the option to select, not the visible display text. |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true). | |
options.waitForTarget | boolean | Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false). |
act.textSelect
Section titled “act.textSelect”Select a continuous run of text within a container element by matching its content. Use to highlight text before copying, replacing, or applying formatting. Pass an empty string as text to clear the current selection.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘ |
text | string | ✓ | Exact text string to find and highlight within the target element. Pass an empty string to clear any existing selection. |
options.occurrence | number | Which occurrence to select when the text appears more than once in the element. 1 selects the first match (1-based index, default: 1). | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true). | |
options.waitForTarget | boolean | Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false). |
act.type
Section titled “act.type”Type text into a target input element. Short inputs are typed character by character; longer inputs are pasted directly. Use for text fields, search boxes, and any editable element. By default, overwrites existing content. If the target resolves to a non-editable wrapper (error: ‘Target must resolve to editable element’), re-take a DOM snapshot with webfuseIDs: true and target the inner input element directly using its wf-id.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
target | string | ✓ | Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘ |
text | string | ✓ | Text to type into the target element. |
options.followFocus | boolean | Continue typing into whichever element holds focus, even if focus moved away from the original target. Set to false to strictly type into the target (default: true). | |
options.overwrite | boolean | Replace the existing content of the input before typing. Set to false to append or insert at the current cursor position (default: true). | |
options.moveMouse | boolean | Move the virtual mouse pointer to the target center before typing. When false, the action is sent directly without moving the pointer (default: true). | |
options.waitForTarget | boolean | Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false). |
connectToSession
Section titled “connectToSession”Bind this MCP connection to a Webfuse session. Must be called before any other tool — until then, only this tool is exposed. After binding, the server registers the session’s tool set and notifies the client via notifications/tools/list_changed; re-fetch the tool list with tools/list. Call again with a different session_id to rebind without reconnecting.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID to bind this connection to |
navigate
Section titled “navigate”Navigate the current browser tab to a new URL. Use to open a page before interacting with it. After navigation completes, take a snapshot to confirm the page loaded as expected.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
url | string | ✓ | URL to navigate to. Supports absolute URLs (e.g. ‘https://example.com/page’) and relative URLs (e.g. ‘/page’), which are resolved against the current tab’s URL. |
pageInfo
Section titled “pageInfo”Retrieve information about the currently active web page, including URL and title.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
see.accessibilityTree
Section titled “see.accessibilityTree”Capture the accessibility tree of the current page as a structured JSON object. Use to understand page semantics - roles, names, ARIA states (checked, expanded, disabled, …) - without parsing raw HTML. Each node includes a wf-id by default (see webfuseIDs option) that can be passed as a string directly as the target to actuation tools. Prefer see.domSnapshot when you need the full HTML structure.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
options.root | string | CSS selector scoping the accessibility tree to a specific subtree instead of the full page. Use to reduce output size when the area of interest is known (default: body). | |
options.quality | number | Snapshot completeness as a float between 0 (lowest) and 1 (highest, default). Values below 1 downsample the underlying DOM before computing the tree, reducing output size at the cost of some fidelity. | |
options.webfuseIDs | boolean | Associate each node with a unique wf-id string for unambiguous targeting. Pass the wf-id directly as the target to other tools. Especially useful when CSS selectors are unreliable — iframes, duplicate ids, or generated markup (default: true). |
see.domSnapshot
Section titled “see.domSnapshot”Capture a structured text representation of the current page’s DOM. Use to read element text, attributes, and hierarchy before deciding which element to interact with. Prefer this over see.guiSnapshot when you need precise element targeting or the page is mostly text-based.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
options.crossFrame | boolean | Include content inside <iframe> elements. Enable when the target element is inside a frame (default: false). | |
options.crossShadow | boolean | Include content inside shadow DOM roots. Disable only if shadow DOM content is not needed (default: true). | |
options.interactiveOnly | boolean | Omit non-interactive elements and return only buttons, inputs, links, and similar controls. Reduces snapshot size when you only need to find actionable elements (default: false). | |
options.quality | number | DOM snapshot completeness as a float between 0 (lowest) and 1 (highest, default). At 1, the DOM is returned as-is - all elements present with full structure and context. Below 1, the snapshot is downsampled after capture: output is smaller but the DOM is structurally altered - elements may be merged, reordered, or dropped, causing loss of context and unreliable element targeting. Exception: if webfuseIDs=true, wf-id attributes survive downsampling and element targeting remains precise. Always prefer 1; only lower if the snapshot is too large to process and you accept degraded accuracy. | |
options.revealMaskedElements | boolean | Include elements that have been masked by the Webfuse Masking App. Masked elements are hidden from the snapshot by default to protect sensitive content. Enable only when you explicitly need to interact with masked elements (default: false). | |
options.root | string | CSS selector scoping the snapshot to a specific subtree instead of the full page. Use to reduce snapshot size when the area of interest is known (default: body). | |
options.webfuseIDs | boolean | Annotate each element with a unique wf-id string for unambiguous targeting. Pass the wf-id directly as the target to other tools. Especially useful when CSS selectors are unreliable — iframes, duplicate ids, or generated markup (default: false). |
see.guiSnapshot
Section titled “see.guiSnapshot”Capture a screenshot of the current page as an image. Use when rendered visual appearance matters - images, charts, canvas, or verifying layout. Coordinates visible in the screenshot can be passed as [x,y] to action tools, but prefer see.domSnapshot for reliable element targeting or text extraction.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
quality | number | Image compression level as a float between 0 (lowest quality, smallest size) and 1 (highest quality, largest size). Lower values reduce image detail. Default is 0.6. |
see.textSelection
Section titled “see.textSelection”Read the text that is currently selected (highlighted) on the page. Use to verify the result of act.textSelect, or to capture text the user has already selected before acting on it.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
Pause execution for a fixed duration. Use sparingly - only when a page transition, animation, or async operation has no observable DOM signal to wait on. Prefer taking a snapshot to verify state rather than assuming a fixed delay is sufficient.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id | string | ✓ | Webfuse Session ID |
durationMs | number | ✓ | Duration to wait in milliseconds. Typical values: 1000-3000ms. Avoid values above 5000ms. |
System Instructions
Section titled “System Instructions”These instructions are sent to the model automatically when it connects. They describe the available tools and guide the agent’s behaviour. Use this as a starting point and customise it for your use case.
You are an intelligent browser agent which helps users to perform various tasks on the web.You have access to a set of tools that allow you to interact with web pages, extract information, and perform actions.Use these tools to accomplish the user's goals effectively.
This server controls an active Webfuse browser session - use it to interact with web pages (clicking, typing, navigating, observing). For creating or configuring sessions and spaces, or searching documentation, use the Webfuse API & Docs MCP server instead.
## Session- Every tool requires a "session_id" to identify the Webfuse session. Always include the correct "session_id". Ask the user to provide it if you don't have it.
## Observing the page- Prefer "see.domSnapshot" to read page content and identify elements before acting. It returns a structured text representation of the DOM that is precise and reliable for element targeting.- Use "see.guiSnapshot" only when the page is visually complex (images, charts, canvas) and the DOM snapshot is insufficient.- Use "see.accessibilityTree" to understand page semantics - roles, ARIA states (checked, expanded, disabled), and element names - without parsing raw HTML. Unlike see.domSnapshot, it includes wf-ids by default - no need to enable webfuseIDs explicitly.- After navigating to a new URL or performing an action that changes the page, always take a fresh snapshot to confirm the result before proceeding.- If an element is not found in a DOM snapshot, re-take the snapshot with "crossFrame: true" to include content inside iframes.- Use "pageInfo" to retrieve information about the currently opened web page, such as the respective URL.
## Targeting elements- You can target elements by CSS selector, wf-id string from a snapshot, or coordinates in the format [x,y].- When a CSS selector is unreliable - elements inside iframes, duplicate HTML ids on the page, or deeply generated markup - use wf-ids instead. Enable "webfuseIDs: true" in the snapshot to annotate elements with a wf-id, then pass it as a string directly as the target.- To include content inside iframes, enable "crossFrame: true" - iframe content is excluded by default.
## Performing actions- Before interacting with the page, check for and dismiss any overlays that may block interaction - such as cookie consent banners, GDPR notices, newsletter popups, or other modals. Close or accept them first, then proceed with the intended action.- "moveMouse" defaults to true in all action tools - do not explicitly disable it unless the element is already focused and mouse movement is undesirable.- For native HTML <select> dropdowns, use "act.select" (not "act.click"). Pass the option's value attribute, not its display text.- For keyboard shortcuts: key events are dispatched to the page, not the OS. Standard editing keys (Enter, Backspace, Delete) work as expected. OS-level shortcuts such as Ctrl+C or Ctrl+V will NOT work unless the page has explicitly implemented them.
## Waiting- Avoid using "wait" unless there is no observable DOM change to verify against. Always prefer taking a snapshot to confirm state over assuming a fixed delay is sufficient.
## Error handling- Tool results contain an "isError" field. If true, read the "content" field for error details, analyze the cause, and adjust your strategy before retrying.
If unsure - ask the user for clarification or additional information.