4 Context Engineering Patterns for Reliable AI Agents

Your agent is well-prompted. The system message is clean, the instructions are specific, and in testing it performs perfectly. Then you ship it to production — and it starts hallucinating, forgetting earlier steps, and racking up token costs that make your finance team nervous.

The prompt isn’t the problem. Context engineering for AI agents is.

Andrej Karpathy put it precisely: context engineering is “the delicate art and science of filling the context window with just the right information for the next step.” Gartner named it the breakout AI skill of 2026, one month after Karpathy effectively declared prompt engineering dead for industrial-strength applications. With 57% of organizations already running agents in production — and 32% citing output quality as their top barrier (LangChain State of AI Agent Engineering) — getting context management right isn’t optional anymore.

This post breaks down the four core patterns (Write, Select, Compress, and Isolate), maps each one to the specific failure mode it solves, and gives you a decision framework you can apply to your workflows today.

Why Prompt Engineering Can’t Save Your Agent in Production

A prompt is static. A multi-step agent workflow is anything but.

When your agent runs for 20+ turns, calls external tools, processes retrieved documents, and hands off state to sub-agents, the context window becomes a live artifact — accumulating, shifting, and eventually overflowing. No matter how carefully you’ve written your system prompt, you can’t prompt-engineer your way out of a context that’s grown too large, too noisy, or too stale.

A 2025 study published in npj Digital Medicine found that prompt-based context mitigation cut GPT-4o’s hallucination rate from 53% to 23% — meaningful, but still leaving nearly a quarter of outputs wrong. The floor you can reach with prompting alone is higher than most teams want.

Context engineering is the systematic practice of deciding what information goes into the context window, when, and in what form. It shifts the question from “what should I tell the model?” to “what should the model be able to see at each step of execution?” That shift is what separates agents that hold up in production from ones that don’t.

The Four Context Engineering Patterns — Write, Select, Compress, Isolate

Every context engineering strategy in production maps to one of four patterns:

  • Write — save context externally so the agent doesn’t have to carry it
  • Select — pull only the relevant context into the window at each step
  • Compress — summarize accumulated context at checkpoints to reclaim space
  • Isolate — split complex tasks across multiple agents with separate, scoped windows

Each pattern solves a specific failure mode. Using the wrong one for your workflow doesn’t just fail to help — it actively makes things worse. The sections below walk through each pattern in depth, then the decision framework shows you how to choose.

Write — Offloading State Before Your Context Window Overflows

The failure mode Write solves: truncation. When context grows beyond the window ceiling, the model silently drops the oldest content — often including the instructions and early-task decisions your agent most needs.

The Write pattern externalizes state. Instead of accumulating everything in the running context, the agent writes checkpoints, plans, and intermediate results to an external store — a database, file system, or memory service — and reads back only what it currently needs.

Anthropic’s own multi-agent research system uses this pattern explicitly: it saves its plan to an external memory store at the start of execution specifically to avoid truncation if the context exceeds 200,000 tokens mid-task (LangChain). Claude Code’s auto-compact feature takes a similar approach, triggering automatic summarization when an agent hits 95% of its context capacity.

What this looks like in practice

“`python

# At the start of a long-running task

memory_store.write(“task_plan”, agent.generate_plan(user_request))

# At each major step

memory_store.write(f”step_{step_num}_result”, tool_result)

# When context from earlier steps is needed

relevant_history = memory_store.read(f”step_{step_num – 1}_result”)

“`

The key discipline: write proactively, not reactively. If you only write to external storage after you’ve already hit the limit, you’re already losing data. Build write checkpoints into your workflow architecture from the start.

Write is your default first choice for any workflow longer than 10–15 turns.

Select — Pulling Only What the Agent Needs, Right When It Needs It

The failure mode Select solves: distraction and noise. An agent with 50 tool descriptions, a full conversation history, and three retrieved documents in its context doesn’t just have a large window — it has a noisy one. Irrelevant signals bury relevant ones, and the model’s attention diffuses.

The Select pattern uses retrieval — typically semantic search — to inject only the most relevant context at each step. Rather than statically listing everything the agent might need, you dynamically surface what it actually needs right now.

The impact is larger than most developers expect. Tool selection accuracy improves 3x when agent tool descriptions are surfaced via semantic search (RAG) rather than injected exhaustively into the context window (LangChain). The model isn’t smarter — it’s less distracted.

Implementing selective tool loading

Instead of passing all tools on every call:

“`python

# Avoid this for large tool sets

agent.run(task, tools=ALL_TOOLS) # 50+ tools polluting context

# Do this instead

relevant_tools = tool_index.search(task_description, top_k=5)

agent.run(task, tools=relevant_tools)

“`

The same principle applies to memory retrieval, document context, and conversation history. Every token in the window competes for the model’s attention. Select ruthlessly.

A useful mental model: think of Select as giving your agent a focused desk instead of an entire filing room. The right files are already out. Everything else is in the drawer.

Compress — Trimming the Fat Without Losing What Matters

The failure mode Compress solves: token cost bloat. Even with Write and Select in place, context accumulates. Tool outputs, retrieved documents, and multi-turn reasoning chains fill windows quickly — and sending a large context on every call directly multiplies your API costs.

The Compress pattern summarizes or trims accumulated context at key checkpoints. ACON (Adaptive Context Optimization Network) approaches can reduce agent memory usage by 26–54% while preserving 95%+ task accuracy, according to Zylos AI research. That’s not a minor optimization — it’s potentially halving your inference costs on long-running workflows.

When to trigger compression

Compression works best at natural workflow boundaries:

  • After a tool call returns a long document or API response
  • At agent handoff points between sub-tasks
  • When conversation history grows beyond a defined threshold (e.g., 10 turns)
  • Before starting a new major task phase

“`python

def compress_history(messages, llm, threshold=10):

if len(messages) > threshold:

summary = llm.summarize(messages[:-3]) # preserve last 3 turns verbatim

return [{“role”: “system”, “content”: f”Prior context: {summary}”}] + messages[-3:]

return messages

“`

The risks you can’t ignore

Compress is the pattern with the most hidden failure modes:

  • Brevity bias: the summarizing model drops domain-specific details it doesn’t recognize as important — technical variable names, edge-case conditions, specific numeric values
  • Context collapse: iterative recompression over many cycles erodes fidelity; the 10th summary of a summary bears little resemblance to the original event

Mitigate these by summarizing with domain-aware prompts, testing compression against the specific task types you run, and never compressing more than once before verifying that critical details survived in the output.

Isolate — Splitting Context Across Agents (and the Token Cost You Need to Know)

The failure mode Isolate solves: context explosion in complex multi-step workflows. When one agent handles planning, web research, code generation, and output formatting, its context window becomes a dumping ground for every intermediate state from every subtask.

The Isolate pattern splits complex tasks across multiple specialized agents, each with its own scoped context window. An orchestrator delegates subtasks; specialist agents execute them with only the context relevant to their scope. No single agent carries the full workflow history.

This is a genuine reliability gain — specialized agents make fewer errors because their window is tuned to their specific task. They don’t confuse web research context with code generation context.

The cost tradeoff you must surface before adopting this pattern

Multi-agent systems with isolated contexts can consume up to 15x more tokens than single-agent chat-style workflows. (LangChain)

That’s not a typo. When each sub-agent initializes with a system prompt, task description, and relevant tools — multiplied across five or ten specialized agents per workflow run — token consumption compounds fast. If you adopt Isolate without accounting for this, you will optimize for reliability at ruinous infrastructure cost.

The mitigation is to combine Isolate with Select and Compress at handoff boundaries. When the orchestrator hands off to a sub-agent, pass only the scoped context that sub-agent needs — not the full parent history.

“`python

# Context explosion — don’t do this

sub_agent.run(task=subtask, context=full_parent_history)

# Scoped handoff — do this

scoped_context = select_relevant(full_parent_history, subtask)

sub_agent.run(task=subtask, context=scoped_context)

“`

Isolate is the right choice for workflows where tasks genuinely require separate mental models. If you can solve the problem with a single agent using Write and Select, you almost certainly should.

Choosing the Right Pattern: A Decision Framework for Real Workflows

Here’s a practical decision tree for context engineering pattern selection:

1. Is your workflow longer than 10–15 turns, or does it risk exceeding the context ceiling?

→ Yes: use Write to externalize intermediate state. This is table stakes for any non-trivial agent.

2. Does your agent have a large tool set (10+), a large document corpus, or a broad conversation history?

→ Yes: add Select to dynamically retrieve only what’s needed at each step.

3. Are your token costs too high, or is the context window filling up despite Write and Select?

→ Yes: add Compress at natural workflow boundaries. Monitor for brevity bias.

4. Does your workflow involve genuinely distinct subtasks that would pollute each other’s context?

→ Yes: consider Isolate — but only after estimating the token cost multiplication and enforcing scoped handoffs.

A quick symptom-to-pattern reference:

| Symptom | Pattern to Apply |

|—|—|

| Agent “forgets” early decisions | Write |

| Agent selects wrong tools frequently | Select |

| Token costs scaling with conversation length | Compress |

| Sub-agent outputs contaminated by parent history | Isolate + scoped handoff |

| Long-running pipeline with all of the above | Write + Select + Compress at handoffs |

Most production workflows end up combining at least two patterns. Write + Select covers the majority of use cases. Add Compress for cost-sensitive workflows and Isolate only when task separation genuinely requires it.

The Anti-Patterns That Will Break Your Agent Even With Good Context Engineering

Knowing the patterns isn’t enough. These failure modes catch developers after they’ve already implemented a context strategy.

Context poisoning from stale tool outputs. If your agent caches tool results and replays them in later steps, outdated data can corrupt reasoning silently. Timestamp tool outputs and implement TTL-based invalidation for any live data source.

“Lost in the middle” degradation. Research consistently shows LLMs perform worse on information buried in the middle of a long context versus information at the beginning or end. If your Write or Select strategy routinely places critical context in the middle of a large window, accuracy will degrade invisibly. Structure your context so the most important information appears near the top.

Compression without verification. Compressing context and assuming the summary is faithful is one of the fastest ways to introduce silent data loss. After any compression step, run a lightweight validation — a small LLM call that confirms key facts survived, or a structured check against your ground truth for that task type.

Multi-agent context explosion at scale. A root agent that passes its full context to every sub-agent — which in turn passes its full context to its sub-agents — triggers cascading token bloat. The 15x token multiplication compounds at every level of the hierarchy. Enforce scoped handoffs as a hard architectural rule, not a best-effort guideline.

Build Context Engineering In from the Start

Context engineering for AI agents isn’t a refinement of prompt engineering — it’s a different discipline entirely. Prompts set intent. Context determines whether the model can execute on that intent reliably across a complex, multi-turn workflow.

The four patterns give you a complete toolkit: Write to prevent truncation, Select to eliminate noise, Compress to control cost, and Isolate to scope complex subtasks. The decision framework maps your specific workflow shape to the right pattern — or combination — before you run into production failures.

Start by auditing one agent you have running today. Where is context accumulating unchecked? That’s where you apply your first pattern.

Leave a Reply

Your email address will not be published. Required fields are marked *