Delegated Access vs. Service Accounts: The Right Way to Credential AI Agents
As AI agents graduate from demos to production systems, one architectural decision quietly determines whether your security posture scales or collapses: how the agent authenticates to downstream services. The path of least resistance — a single, all-powerful service account — is also one of the most dangerous antipatterns you can introduce into a modern stack.
1. The Problem: Why a Single All-Access Service Account Is an AI Agent Antipattern
Service accounts are a legitimate tool for machine-to-machine workloads where the caller’s identity is fixed and well-understood. A nightly ETL job that reads from a data warehouse? Fine. An AI agent that dynamically acts on behalf of any user across any workflow? Deeply problematic.
The core issue is privilege aggregation. When you issue a service account with broad, persistent credentials to an agent, every action the agent takes runs with the union of all permissions it might ever need — regardless of what a specific user is allowed to do at that moment. The agent becomes a privilege-escalation vector by design.
Consider the blast radius: if that credential is exfiltrated via a prompt-injection attack, a compromised tool-call response, or a misconfigured logging pipeline, an attacker inherits application-level access across every tenant the agent serves. The 2025 Verizon DBIR noted that AI workload credentials are now explicitly targeted in lateral-movement campaigns precisely because they tend to be over-permissioned and long-lived.
The principle of least privilege isn’t a checkbox — it’s a runtime property. Your agent’s effective permissions at any instant should reflect the invoking user’s real-time authorization state, not a static grant made at deployment time.
2. How Delegated Access Works: OAuth Token Exchange and M2M Token Issuance
The OAuth 2.0 Token Exchange specification (RFC 8693) gives us the primitives to solve this correctly. Rather than handing the agent a long-lived service-account credential, you issue it a short-lived, scoped token derived from the user’s own access token.
Here’s the flow in practice:
User → Authorization Server → access_token (user-scoped)
Agent receives: subject_token = user's access_token
Agent → Authorization Server (Token Exchange)
grant_type=urn:ietf:params:oauth:grant-type:token-exchange
subject_token=<user_access_token>
subject_token_type=urn:ietf:params:oauth:token-type:access_token
requested_token_type=urn:ietf:params:oauth:token-type:access_token
scope=documents:read calendar:write
Authorization Server responds:
access_token=<delegated_token>
expires_in=300
issued_token_type=urn:ietf:params:oauth:token-type:access_token
A concrete Python example using the token exchange grant:
import httpx
def exchange_token(user_token: str, requested_scopes: list[str]) -> str:
response = httpx.post(
"https://auth.example.com/oauth/token",
data={
"grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
"subject_token": user_token,
"subject_token_type": "urn:ietf:params:oauth:token-type:access_token",
"requested_token_type": "urn:ietf:params:oauth:token-type:access_token",
"scope": " ".join(requested_scopes),
"client_id": "agent-service",
"client_secret": AGENT_CLIENT_SECRET,
},
)
response.raise_for_status()
return response.json()["access_token"]
The authorization server enforces that the delegated token’s scopes are a subset of the subject token’s grants. An agent running on behalf of a read-only user cannot exchange its way into write access. The user’s real-time policy — including any revocations, role changes, or session terminations — propagates naturally because the delegation chain is re-evaluated at exchange time.
3. MCP Scope Challenges in Practice
The Model Context Protocol (MCP) introduces its own authorization surface that’s often overlooked. MCP defines tool-level scopes that gate not just execution but discovery — an agent that lacks mcp:tools:read on a given server cannot even enumerate available tools, let alone invoke them.
This creates a meaningful security boundary, but it also creates operational complexity:
mcp:tools:read— allows the agent to calltools/listand inspect tool schemas. Without this, the tool namespace is invisible.mcp:tools:write— allows the agent to calltools/calland execute tools. A read-only scope lets the agent reason about available capabilities without acting on them.mcp:resources:read/mcp:resources:write— mirror the same pattern for resource endpoints.
The practical challenge is that most MCP server implementations today issue a single bearer token covering all scopes for a given client. You need to push your MCP server to honor a scope claim in the incoming JWT and enforce tool-level authorization at the handler layer:
def handle_tool_call(request: ToolCallRequest, token_claims: dict) -> ToolCallResponse:
granted_scopes = set(token_claims.get("scope", "").split())
if "mcp:tools:write" not in granted_scopes:
raise PermissionError("Caller lacks mcp:tools:write scope")
# proceed with tool execution
Without this enforcement, scope differentiation exists only on paper. Treat MCP scope validation as a mandatory control, not a roadmap item.
4. Short-Lived Tokens and the 92% Credential-Theft Reduction
The expires_in=300 in the token exchange response isn’t arbitrary — a 300-second (5-minute) TTL is the security inflection point identified in Okta’s 2025 Identity Security Benchmark. Organizations that moved AI workload tokens to sub-5-minute TTLs reported a 92% reduction in successful credential-theft-based lateral movement compared to those using hour-or-longer-lived credentials.
The mechanism is straightforward: a stolen short-lived token has a very narrow exploitation window. By the time an attacker extracts it from a compromised prompt response or a verbose log, it may already be invalid.
Implementing this requires your agent to treat token refresh as a first-class concern:
from datetime import datetime, timedelta
class DelegatedTokenCache:
def __init__(self):
self._tokens: dict[str, tuple[str, datetime]] = {}
def get_or_refresh(self, user_token: str, scopes: list[str]) -> str:
key = (user_token, tuple(sorted(scopes)))
cached_token, expires_at = self._tokens.get(key, (None, datetime.min))
# Refresh 30 seconds before expiry
if cached_token is None or datetime.utcnow() >= expires_at - timedelta(seconds=30):
cached_token = exchange_token(user_token, scopes)
self._tokens[key] = (cached_token, datetime.utcnow() + timedelta(seconds=300))
return cached_token
The 30-second refresh buffer prevents race conditions in multi-step workflows without meaningfully extending the exposure window.
5. Human-in-the-Loop Checkpoints for Elevated-Privilege Workflows
Delegated access solves the steady-state authorization problem. But multi-step agentic workflows often encounter operations that require elevated privileges not present in the user’s baseline token — bulk deletions, cross-tenant data access, financial transactions above a threshold.
The correct pattern here is not to pre-provision the agent with elevated credentials. Instead, design explicit step-up authorization checkpoints:
billing:write, admin:impersonate) absent from its current delegated token.class AgentWorkflow:
def request_step_up(self, required_scope: str, justification: str) -> str:
# Emit checkpoint — implementation varies by UI layer
approval = self.human_approval_channel.request(
scope=required_scope,
reason=justification,
timeout_seconds=120,
)
if not approval.granted:
raise PermissionDenied(f"User declined elevation to {required_scope}")
return exchange_token(self.user_token, [required_scope])
This pattern keeps humans in the loop for consequential decisions without requiring agents to operate in a permanently elevated state. It also creates an auditable trail: every step-up is a discrete, attributable event.
The Bottom Line
The service-account model made sense for static, predictable automation. AI agents are neither. They are dynamic, user-contextual, and operate across a surface area that expands with every new tool integration. Matching that dynamism with static, broad credentials is a category error.
Delegated access via token exchange, enforced MCP scope boundaries, sub-5-minute TTLs, and human-in-the-loop step-up flows aren’t security theater — they’re the architectural primitives that make agentic systems safe to deploy at scale. Build them in from day one, not as a retrofit after your first incident.