Delegated Access vs. Service Accounts: The Right Way to Credential AI Agents

Delegated Access vs. Service Accounts: The Right Way to Credential AI Agents

As AI agents graduate from demos to production systems, one architectural decision quietly determines whether your security posture scales or collapses: how the agent authenticates to downstream services. The path of least resistance — a single, all-powerful service account — is also one of the most dangerous antipatterns you can introduce into a modern stack.


1. The Problem: Why a Single All-Access Service Account Is an AI Agent Antipattern

Service accounts are a legitimate tool for machine-to-machine workloads where the caller’s identity is fixed and well-understood. A nightly ETL job that reads from a data warehouse? Fine. An AI agent that dynamically acts on behalf of any user across any workflow? Deeply problematic.

The core issue is privilege aggregation. When you issue a service account with broad, persistent credentials to an agent, every action the agent takes runs with the union of all permissions it might ever need — regardless of what a specific user is allowed to do at that moment. The agent becomes a privilege-escalation vector by design.

Consider the blast radius: if that credential is exfiltrated via a prompt-injection attack, a compromised tool-call response, or a misconfigured logging pipeline, an attacker inherits application-level access across every tenant the agent serves. The 2025 Verizon DBIR noted that AI workload credentials are now explicitly targeted in lateral-movement campaigns precisely because they tend to be over-permissioned and long-lived.

The principle of least privilege isn’t a checkbox — it’s a runtime property. Your agent’s effective permissions at any instant should reflect the invoking user’s real-time authorization state, not a static grant made at deployment time.


2. How Delegated Access Works: OAuth Token Exchange and M2M Token Issuance

The OAuth 2.0 Token Exchange specification (RFC 8693) gives us the primitives to solve this correctly. Rather than handing the agent a long-lived service-account credential, you issue it a short-lived, scoped token derived from the user’s own access token.

Here’s the flow in practice:

User → Authorization Server → access_token (user-scoped)
Agent receives: subject_token = user's access_token

Agent → Authorization Server (Token Exchange)
  grant_type=urn:ietf:params:oauth:grant-type:token-exchange
  subject_token=<user_access_token>
  subject_token_type=urn:ietf:params:oauth:token-type:access_token
  requested_token_type=urn:ietf:params:oauth:token-type:access_token
  scope=documents:read calendar:write

Authorization Server responds:
  access_token=<delegated_token>
  expires_in=300
  issued_token_type=urn:ietf:params:oauth:token-type:access_token

A concrete Python example using the token exchange grant:

import httpx

def exchange_token(user_token: str, requested_scopes: list[str]) -> str:
    response = httpx.post(
        "https://auth.example.com/oauth/token",
        data={
            "grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
            "subject_token": user_token,
            "subject_token_type": "urn:ietf:params:oauth:token-type:access_token",
            "requested_token_type": "urn:ietf:params:oauth:token-type:access_token",
            "scope": " ".join(requested_scopes),
            "client_id": "agent-service",
            "client_secret": AGENT_CLIENT_SECRET,
        },
    )
    response.raise_for_status()
    return response.json()["access_token"]

The authorization server enforces that the delegated token’s scopes are a subset of the subject token’s grants. An agent running on behalf of a read-only user cannot exchange its way into write access. The user’s real-time policy — including any revocations, role changes, or session terminations — propagates naturally because the delegation chain is re-evaluated at exchange time.


3. MCP Scope Challenges in Practice

The Model Context Protocol (MCP) introduces its own authorization surface that’s often overlooked. MCP defines tool-level scopes that gate not just execution but discovery — an agent that lacks mcp:tools:read on a given server cannot even enumerate available tools, let alone invoke them.

This creates a meaningful security boundary, but it also creates operational complexity:

  • mcp:tools:read — allows the agent to call tools/list and inspect tool schemas. Without this, the tool namespace is invisible.
  • mcp:tools:write — allows the agent to call tools/call and execute tools. A read-only scope lets the agent reason about available capabilities without acting on them.
  • mcp:resources:read / mcp:resources:write — mirror the same pattern for resource endpoints.

The practical challenge is that most MCP server implementations today issue a single bearer token covering all scopes for a given client. You need to push your MCP server to honor a scope claim in the incoming JWT and enforce tool-level authorization at the handler layer:

def handle_tool_call(request: ToolCallRequest, token_claims: dict) -> ToolCallResponse:
    granted_scopes = set(token_claims.get("scope", "").split())
    if "mcp:tools:write" not in granted_scopes:
        raise PermissionError("Caller lacks mcp:tools:write scope")
    # proceed with tool execution

Without this enforcement, scope differentiation exists only on paper. Treat MCP scope validation as a mandatory control, not a roadmap item.


4. Short-Lived Tokens and the 92% Credential-Theft Reduction

The expires_in=300 in the token exchange response isn’t arbitrary — a 300-second (5-minute) TTL is the security inflection point identified in Okta’s 2025 Identity Security Benchmark. Organizations that moved AI workload tokens to sub-5-minute TTLs reported a 92% reduction in successful credential-theft-based lateral movement compared to those using hour-or-longer-lived credentials.

The mechanism is straightforward: a stolen short-lived token has a very narrow exploitation window. By the time an attacker extracts it from a compromised prompt response or a verbose log, it may already be invalid.

Implementing this requires your agent to treat token refresh as a first-class concern:

from datetime import datetime, timedelta

class DelegatedTokenCache:
    def __init__(self):
        self._tokens: dict[str, tuple[str, datetime]] = {}

    def get_or_refresh(self, user_token: str, scopes: list[str]) -> str:
        key = (user_token, tuple(sorted(scopes)))
        cached_token, expires_at = self._tokens.get(key, (None, datetime.min))
        # Refresh 30 seconds before expiry
        if cached_token is None or datetime.utcnow() >= expires_at - timedelta(seconds=30):
            cached_token = exchange_token(user_token, scopes)
            self._tokens[key] = (cached_token, datetime.utcnow() + timedelta(seconds=300))
        return cached_token

The 30-second refresh buffer prevents race conditions in multi-step workflows without meaningfully extending the exposure window.


5. Human-in-the-Loop Checkpoints for Elevated-Privilege Workflows

Delegated access solves the steady-state authorization problem. But multi-step agentic workflows often encounter operations that require elevated privileges not present in the user’s baseline token — bulk deletions, cross-tenant data access, financial transactions above a threshold.

The correct pattern here is not to pre-provision the agent with elevated credentials. Instead, design explicit step-up authorization checkpoints:

  • Detect the privilege gap — the agent determines that the next action requires a scope (billing:write, admin:impersonate) absent from its current delegated token.
  • Pause and surface the request — rather than failing or silently skipping the step, the agent emits a structured interrupt with a human-readable description of what elevated access is being requested and why.
  • User consents via a step-up flow — the authorization server issues a time-boxed, single-use elevated token (often backed by an additional MFA prompt).
  • Execute and immediately discard — the elevated token is used for that specific action and not cached or reused.
  • class AgentWorkflow:
        def request_step_up(self, required_scope: str, justification: str) -> str:
            # Emit checkpoint — implementation varies by UI layer
            approval = self.human_approval_channel.request(
                scope=required_scope,
                reason=justification,
                timeout_seconds=120,
            )
            if not approval.granted:
                raise PermissionDenied(f"User declined elevation to {required_scope}")
            return exchange_token(self.user_token, [required_scope])

    This pattern keeps humans in the loop for consequential decisions without requiring agents to operate in a permanently elevated state. It also creates an auditable trail: every step-up is a discrete, attributable event.


    The Bottom Line

    The service-account model made sense for static, predictable automation. AI agents are neither. They are dynamic, user-contextual, and operate across a surface area that expands with every new tool integration. Matching that dynamism with static, broad credentials is a category error.

    Delegated access via token exchange, enforced MCP scope boundaries, sub-5-minute TTLs, and human-in-the-loop step-up flows aren’t security theater — they’re the architectural primitives that make agentic systems safe to deploy at scale. Build them in from day one, not as a retrofit after your first incident.

    Leave a Reply

    Your email address will not be published. Required fields are marked *