Skip to content

Session MCP Server

The Session MCP Server exposes the Automation API over the Model Context Protocol (MCP). Connect any MCP-compatible AI agent to a live proxied browser session — your agent can open URLs, read the page, fill in forms, click buttons, select text, and take screenshots, all without modifying the underlying website.

  • Third-party agents — connect Cognigy, ElevenLabs, or similar platforms to a live browser session without modifying the underlying website.
  • Agent orchestration — use MCP-capable orchestrators (LangGraph, AutoGen, CrewAI) to build agentic workflows on top of any proxied web page.
  • Agent development — iterate on prompts and tool calls interactively against a live session from Cursor, VS Code, or any MCP client.
  1. Create a Space in Webfuse Studio . See the Getting Started for a step-by-step guide.

  2. In Webfuse Studio , open the newly created Space, navigate to Settings → API Keys, and generate a new Space Automation API key (prefixed ak_). This token grants full remote control over the session. Treat it as a secret - do not expose it in client-side code, logs, or URLs.

  3. Open a Session, toggle the Session Editor bar, and open the Apps tab. Find the Automation app and install it. See Apps for more details.

  4. By default all automation tools are available. To restrict which tools agents can use, open the Automation app settings and click Configure tools. Uncheck any tools you want to disable for this Space — the change applies to all sessions in the Space. To disable automation entirely, uninstall the Automation app instead.

  5. The Automation app takes effect after a session restart. Close the current Session and start a new one.

Configure your MCP client to connect to the Session MCP Server endpoint for your domain:

https://session-mcp.HOSTNAME/mcp

Authenticate with the Space Automation API key as a Bearer token:

Authorization: Bearer <your-space-automation-key>

By default, the server returns all tools on the first tools/list request and every tool requires a session_id parameter.

If you append ?dynamic=true to the endpoint URL, the server starts with only the connectToSession tool. Call it with a session_id to bind the connection to a session — the server then registers the full tool set and sends a notifications/tools/list_changed notification so the client can re-fetch the tool list. Calling connectToSession with a different session_id rebinds the connection without reconnecting.

https://session-mcp.HOSTNAME/mcp?dynamic=true
Install in VS Code

Or add manually to .vscode/mcp.json in your workspace, or to user settings:

{
"servers": {
"webfuse-session": {
"type": "http",
"url": "https://session-mcp.HOSTNAME/mcp",
"headers": {
"Authorization": "Bearer ${input:automation_key}"
}
}
},
"inputs": [
{
"type": "promptString",
"id": "automation_key",
"description": "Space Automation API Key",
"password": true
}
]
}

Start a fresh session, then ask your agent:

“In Webfuse session, open https://webfuse.com and describe what you see.”

The agent will ask for a session ID, then use navigate to load the page and see.domSnapshot or see.guiSnapshot to read it. From there you can ask it to interact with the page in natural language — click a link, fill in a form, or summarise the content.

LimitValueNotes
Tool call timeout15sMaximum time allowed for a single tool call, including network transfer in both directions and tool execution in the browser. If the round-trip doesn’t complete within this window the MCP server returns a timeout error to the agent.
Tool call input size16KiBMaximum decompressed size of the tool-call arguments sent to the server.
Tool call response size10MiBMaximum decompressed size of a tool result. Larger responses are rejected and the agent receives an error instead.
MCP Client connection duration3minMCP client connections are automatically closed after 3 minutes. This is a hard limit — clients must reconnect after this period to continue making tool calls.

All tools require a session_id identifying the target session.

Finding your session ID: It appears in the session URL as the path segment after the hostname, for example:

https://HOSTNAME/sGpUNaFXihCSxCUfb3zezgaCw

For programmatic access, you can also retrieve it from a REST API response or a webhook payload.

Most actuation tools also accept a target — a CSS selector, Webfuse ID, or [x,y] coordinates.

Execution context: All commands are executed on the active tab of the session, on the tab owner’s browser. If the active tab or its tab owner is not present when a tool call arrives, the call fails with an error.

Click a target element with the specified mouse button. Use for buttons, links, checkboxes, and any other interactive element. Defaults to a left-button click.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
options.buttonstringMouse button to use: ‘left’ (default), ‘middle’, or ‘right’.
options.moveMousebooleanMove the virtual mouse pointer to the target center before clicking. When false, the click is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Press a single key on a target element, with optional modifier keys. Key events are dispatched directly to the page, not to the operating system. OS-level shortcuts such as Ctrl+C (copy) or Ctrl+V (paste) will NOT work unless the page has explicitly implemented them. Standard editing keys (Enter, Backspace, Delete) and page-handled shortcuts work as expected.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
keystringKey to press using the KeyboardEvent.key name (e.g. ‘Enter’, ‘ArrowUp’, ‘a’, ‘B’, ‘F5’).
options.altKeybooleanHold the Alt key while pressing the key. Only effective if the page handles the resulting combination (default: false).
options.ctrlKeybooleanHold the Control key while pressing the key. Only effective if the page handles the resulting combination - OS shortcuts like Ctrl+C will not work (default: false).
options.metaKeybooleanHold the Meta (Cmd on macOS, Win on Windows) key while pressing the key (default: false).
options.shiftKeybooleanHold the Shift key while pressing the key (default: false).
options.moveMousebooleanMove the virtual mouse pointer to the target center before pressing the key. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Move the virtual mouse pointer to a target element or coordinates without clicking. Use to trigger hover states, tooltips, or drop-down menus that require mouse proximity. Can optionally keep the pointer visible on screen after the move.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
options.persistentbooleanKeep the pointer visible on screen indefinitely after the move. When false (default), the pointer fades out automatically after a short delay.
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Scroll a target element or the page by a given number of pixels. Use to bring off-screen content into view or to navigate long pages. Positive amounts scroll down or right; negative amounts scroll up or left. When scrolling the full page rather than a specific element, use ‘html’ as the target - ‘body’ often does not respond to scrolling.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
amountnumberNumber of pixels to scroll. Positive scrolls down or right; negative scrolls up or left.
options.directionstringAxis to scroll along: ‘vertical’ (up/down, default) or ‘horizontal’ (left/right).
options.moveMousebooleanMove the virtual mouse pointer to the target center before scrolling. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Select an option in a <select> dropdown element by matching its value attribute. Use this instead of act.click when interacting with native HTML dropdowns.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
valuestringThe value attribute of the option to select, not the visible display text.
options.moveMousebooleanMove the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Select a continuous run of text within a container element by matching its content. Use to highlight text before copying, replacing, or applying formatting. Pass an empty string as text to clear the current selection.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
textstringExact text string to find and highlight within the target element. Pass an empty string to clear any existing selection.
options.occurrencenumberWhich occurrence to select when the text appears more than once in the element. 1 selects the first match (1-based index, default: 1).
options.moveMousebooleanMove the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Type text into a target input element. Short inputs are typed character by character; longer inputs are pasted directly. Use for text fields, search boxes, and any editable element. By default, overwrites existing content. If the target resolves to a non-editable wrapper (error: ‘Target must resolve to editable element’), re-take a DOM snapshot with webfuseIDs: true and target the inner input element directly using its wf-id.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
textstringText to type into the target element.
options.followFocusbooleanContinue typing into whichever element holds focus, even if focus moved away from the original target. Set to false to strictly type into the target (default: true).
options.overwritebooleanReplace the existing content of the input before typing. Set to false to append or insert at the current cursor position (default: true).
options.moveMousebooleanMove the virtual mouse pointer to the target center before typing. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Bind this MCP connection to a Webfuse session. Must be called before any other tool — until then, only this tool is exposed. After binding, the server registers the session’s tool set and notifies the client via notifications/tools/list_changed; re-fetch the tool list with tools/list. Call again with a different session_id to rebind without reconnecting.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID to bind this connection to

Navigate the current browser tab to a new URL. Use to open a page before interacting with it. After navigation completes, take a snapshot to confirm the page loaded as expected.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
urlstringURL to navigate to. Supports absolute URLs (e.g. ‘https://example.com/page’) and relative URLs (e.g. ‘/page’), which are resolved against the current tab’s URL.

Retrieve information about the currently active web page, including URL and title.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID

Capture the accessibility tree of the current page as a structured JSON object. Use to understand page semantics - roles, names, ARIA states (checked, expanded, disabled, …) - without parsing raw HTML. Each node includes a wf-id by default (see webfuseIDs option) that can be passed as a string directly as the target to actuation tools. Prefer see.domSnapshot when you need the full HTML structure.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
options.rootstringCSS selector scoping the accessibility tree to a specific subtree instead of the full page. Use to reduce output size when the area of interest is known (default: body).
options.qualitynumberSnapshot completeness as a float between 0 (lowest) and 1 (highest, default). Values below 1 downsample the underlying DOM before computing the tree, reducing output size at the cost of some fidelity.
options.webfuseIDsbooleanAssociate each node with a unique wf-id string for unambiguous targeting. Pass the wf-id directly as the target to other tools. Especially useful when CSS selectors are unreliable — iframes, duplicate ids, or generated markup (default: true).

Capture a structured text representation of the current page’s DOM. Use to read element text, attributes, and hierarchy before deciding which element to interact with. Prefer this over see.guiSnapshot when you need precise element targeting or the page is mostly text-based.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
options.crossFramebooleanInclude content inside <iframe> elements. Enable when the target element is inside a frame (default: false).
options.crossShadowbooleanInclude content inside shadow DOM roots. Disable only if shadow DOM content is not needed (default: true).
options.interactiveOnlybooleanOmit non-interactive elements and return only buttons, inputs, links, and similar controls. Reduces snapshot size when you only need to find actionable elements (default: false).
options.qualitynumberDOM snapshot completeness as a float between 0 (lowest) and 1 (highest, default). At 1, the DOM is returned as-is - all elements present with full structure and context. Below 1, the snapshot is downsampled after capture: output is smaller but the DOM is structurally altered - elements may be merged, reordered, or dropped, causing loss of context and unreliable element targeting. Exception: if webfuseIDs=true, wf-id attributes survive downsampling and element targeting remains precise. Always prefer 1; only lower if the snapshot is too large to process and you accept degraded accuracy.
options.revealMaskedElementsbooleanInclude elements that have been masked by the Webfuse Masking App. Masked elements are hidden from the snapshot by default to protect sensitive content. Enable only when you explicitly need to interact with masked elements (default: false).
options.rootstringCSS selector scoping the snapshot to a specific subtree instead of the full page. Use to reduce snapshot size when the area of interest is known (default: body).
options.webfuseIDsbooleanAnnotate each element with a unique wf-id string for unambiguous targeting. Pass the wf-id directly as the target to other tools. Especially useful when CSS selectors are unreliable — iframes, duplicate ids, or generated markup (default: false).

Capture a screenshot of the current page as an image. Use when rendered visual appearance matters - images, charts, canvas, or verifying layout. Coordinates visible in the screenshot can be passed as [x,y] to action tools, but prefer see.domSnapshot for reliable element targeting or text extraction.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
qualitynumberImage compression level as a float between 0 (lowest quality, smallest size) and 1 (highest quality, largest size). Lower values reduce image detail. Default is 0.6.

Read the text that is currently selected (highlighted) on the page. Use to verify the result of act.textSelect, or to capture text the user has already selected before acting on it.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID

Pause execution for a fixed duration. Use sparingly - only when a page transition, animation, or async operation has no observable DOM signal to wait on. Prefer taking a snapshot to verify state rather than assuming a fixed delay is sufficient.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
durationMsnumberDuration to wait in milliseconds. Typical values: 1000-3000ms. Avoid values above 5000ms.

These instructions are sent to the model automatically when it connects. They describe the available tools and guide the agent’s behaviour. Use this as a starting point and customise it for your use case.

You are an intelligent browser agent which helps users to perform various tasks on the web.
You have access to a set of tools that allow you to interact with web pages, extract information, and perform actions.
Use these tools to accomplish the user's goals effectively.
This server controls an active Webfuse browser session - use it to interact with web pages (clicking, typing, navigating, observing). For creating or configuring sessions and spaces, or searching documentation, use the Webfuse API & Docs MCP server instead.
## Session
- Every tool requires a "session_id" to identify the Webfuse session. Always include the correct "session_id". Ask the user to provide it if you don't have it.
## Observing the page
- Prefer "see.domSnapshot" to read page content and identify elements before acting. It returns a structured text representation of the DOM that is precise and reliable for element targeting.
- Use "see.guiSnapshot" only when the page is visually complex (images, charts, canvas) and the DOM snapshot is insufficient.
- Use "see.accessibilityTree" to understand page semantics - roles, ARIA states (checked, expanded, disabled), and element names - without parsing raw HTML. Unlike see.domSnapshot, it includes wf-ids by default - no need to enable webfuseIDs explicitly.
- After navigating to a new URL or performing an action that changes the page, always take a fresh snapshot to confirm the result before proceeding.
- If an element is not found in a DOM snapshot, re-take the snapshot with "crossFrame: true" to include content inside iframes.
- Use "pageInfo" to retrieve information about the currently opened web page, such as the respective URL.
## Targeting elements
- You can target elements by CSS selector, wf-id string from a snapshot, or coordinates in the format [x,y].
- When a CSS selector is unreliable - elements inside iframes, duplicate HTML ids on the page, or deeply generated markup - use wf-ids instead. Enable "webfuseIDs: true" in the snapshot to annotate elements with a wf-id, then pass it as a string directly as the target.
- To include content inside iframes, enable "crossFrame: true" - iframe content is excluded by default.
## Performing actions
- Before interacting with the page, check for and dismiss any overlays that may block interaction - such as cookie consent banners, GDPR notices, newsletter popups, or other modals. Close or accept them first, then proceed with the intended action.
- "moveMouse" defaults to true in all action tools - do not explicitly disable it unless the element is already focused and mouse movement is undesirable.
- For native HTML <select> dropdowns, use "act.select" (not "act.click"). Pass the option's value attribute, not its display text.
- For keyboard shortcuts: key events are dispatched to the page, not the OS. Standard editing keys (Enter, Backspace, Delete) work as expected. OS-level shortcuts such as Ctrl+C or Ctrl+V will NOT work unless the page has explicitly implemented them.
## Waiting
- Avoid using "wait" unless there is no observable DOM change to verify against. Always prefer taking a snapshot to confirm state over assuming a fixed delay is sufficient.
## Error handling
- Tool results contain an "isError" field. If true, read the "content" field for error details, analyze the cause, and adjust your strategy before retrying.
If unsure - ask the user for clarification or additional information.