Browser Login

Let users log into websites through a live browser in the sandbox.

Status: Not yet implemented. You can declare browser_login connections in grexal.json and the CLI will accept them, but the platform currently refuses to run any agent that declares one (the desktop sandbox + VNC takeover infrastructure it needs hasn't shipped). The reference below is kept for contract stability. For now, use secret_input and have the user paste session cookies.

For services that have no API and no OAuth — the user logs in through a browser and the agent operates the browser with the resulting session. This is the most complex auth mode and the only one that does not use pre-collection.

How it's different

The other three auth modes extract a credential (keys, tokens, files), store it in the vault, and inject it into the sandbox. Browser login doesn't extract anything. The session lives in the browser inside the sandbox — the cookies, localStorage, and session state exist in the same browser process the agent operates.

This means the sandbox must be running before the user logs in. The agent starts, encounters a login wall, requests the user to take over the browser, and resumes after the user has logged in.

Manifest declaration

{
  "connections": [
    {
      "id": "google_account",
      "display_name": "Google Account",
      "auth_mode": "browser_login",
      "login_url": "https://accounts.google.com"
    }
  ]
}

The login_url is informational — shown to the user in the consent prompt so they know what site they'll be logging into. The agent's code decides when and where to navigate.

Sandbox requirement

Browser login agents must use the desktop sandbox template, which provides a full Linux desktop with a browser:

{
  "runtime": {
    "language": "python",
    "sandbox_template": "desktop",
    "memory_mb": 4096,
    "timeout_seconds": 600,
    "cpu": 2
  }
}

Desktop sandboxes are heavier than standard sandboxes. This is reflected in the agent's pricing.

The SDK call: `ctx.request_takeover()`

The developer's only coordination point with the platform is a single SDK call:

async def run(ctx: AgentContext):
    task = await ctx.task()
    browser = ctx.browser()

    await browser.goto("https://maps.google.com")

    # Developer's own logic to detect login is needed
    if await browser.query_selector(".sign-in-button"):
        await ctx.request_takeover(
            connection_id="google_account",
            reason="Please log into your Google account to create a Maps list"
        )
        # This call blocks until the user finishes
        # When it returns, the browser is logged in

    # Agent continues with the authenticated browser
    await browser.click(".create-list-button")
    # ...

ctx.request_takeover() is a blocking async call. The agent's process is alive but suspended on this await. When the user finishes and hands back control, the call resolves and the agent continues with the browser in whatever state the user left it.

What happens under the hood

Agent calls ctx.request_takeover(connection_id, reason)
    ↓
SDK sends HTTP POST to platform:
  POST {GREXAL_PLATFORM_URL}/runs/{run_id}/takeover-request
  { connection_id: "google_account", reason: "Please log in..." }
    ↓
Platform updates run status to blocked_on_browser_takeover
    ↓
User sees in the chat or workflow dashboard:

  ┌───────────────────────────────────────────────────────┐
  │  Maps List Creator needs you to log in.                │
  │  "Please log into your Google account to create        │
  │   a Maps list"                                         │
  │                                                        │
  │  ⚠ The agent cannot see the browser while you're       │
  │    in control.                                         │
  │                                                        │
  │  [Take over browser]                                   │
  └───────────────────────────────────────────────────────┘

    ↓
User clicks "Take over browser"
    ↓
Grexal UI connects to the desktop sandbox's VNC stream
    ↓
User sees the live browser, logs in (password, 2FA, CAPTCHA — all handled)
    ↓
User clicks "Finish" in the Grexal UI
    ↓
ctx.request_takeover() resolves → agent continues

Privacy during takeover

While the user is controlling the browser:

The agent's process is suspended. It cannot execute code, take screenshots, or observe the browser.
The platform does not record the screen. No screenshots, no session recording, no VNC capture during user control.
Only the user's browser tab sees the VNC stream. It is not stored, logged, or accessible to anyone else.

Takeover timeout

The user has 5 minutes to complete the takeover. If the user doesn't click "Take over browser" or "Finish" within 5 minutes:

The sandbox is killed
The run is marked as cancelled with reason takeover_timeout
The user is notified: "The agent timed out waiting for you to log in."

The sandbox remains alive and billing continues during the takeover. The 5-minute timeout prevents zombie sandboxes.

Takeover in workflows

When a browser login agent is part of a scheduled workflow and the agent requests takeover:

The workflow node enters blocked_on_browser_takeover
The user receives a notification (email + in-app): "Your workflow needs you to log in"
The user opens the notification, sees the takeover prompt, and completes the login
The workflow resumes

If the user doesn't respond within 5 minutes, the sandbox is killed and the workflow's error handling applies.

Browser login agents in recurring workflows will need the user to log in on every run where the session has expired. If this becomes too frequent, consider using oauth_redirect or secret_input instead.

Multiple takeover requests

An agent may request takeover multiple times in a single run — for example, if it needs the user to log in to two different services:

await ctx.request_takeover(
    connection_id="google_account",
    reason="Log into Google Maps"
)
# User logs into Google, clicks Finish

await ctx.request_takeover(
    connection_id="yelp_account",
    reason="Log into Yelp to cross-reference reviews"
)
# User logs into Yelp, clicks Finish

Trust and safety

Agents that use browser_login have direct access to the user's logged-in session. Mitigations include:

Mandatory AI security review on every deployment, with additional scrutiny for browser login agents
Verified developer accounts — identity verification required for all developers
Marketplace reputation — reviews, success rate, and failure rate give users trust signal

Field reference

Field	Type	Required	Description
`login_url`	`string`	No	The URL the user will log into (HTTPS). Informational — shown in consent prompt.

runtime.sandbox_template must be "desktop" when using browser_login.