Multi-Agent AI Coding Pipeline: Spec, Implement, Review

Every developer using AI coding assistants has hit the same wall. You’re deep in a complex feature, the single agent starts hallucinating, the context window fills up, and you’re back to square one — re-explaining the same architecture from scratch. There’s a better way.

A multi-agent AI coding pipeline separates your work into three specialized roles — spec, implement, and review — each running in its own isolated context, each handing off a versioned artifact to the next. Teams using this structure report 30–50% faster delivery and 20–40% fewer failed deployments (aiautomationglobal.com, 2026). The tooling to run this pipeline ships in VS Code and Claude Code today. Here’s exactly how to set it up.

Why One AI Agent Is No Longer Enough (The Context Ceiling Problem)

The average Claude Code session in Q1 2026 involves multi-file edits and runs 23 minutes — up from 4 minutes a year ago, according to Anthropic. That’s a fivefold increase in session complexity. Single agents were not designed for this.

The fundamental problem is context accumulation. As a single-agent conversation grows, every tool call, file read, and code edit gets appended to the running context. A monolithic handoff chain — one agent doing spec, implementation, and review — balloons to 14K+ tokens. By contrast, stateless subagents that each inherit only what they need use roughly 9K tokens per call: a 35–40% efficiency advantage (Google Developers Blog).

More tokens don’t merely cost more — they degrade output quality. The longer the context, the more the model must attend to irrelevant history.

There’s also the trust problem. Research cited by byteiota.com found that 48% of AI-generated code contains potential security vulnerabilities. When a single agent writes and reviews its own code, you get the worst possible outcome: a model checking its own blind spots.

These aren’t hypothetical risks. Gartner recorded a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025. In February 2026, every major AI coding tool shipped multi-agent capabilities in the same two-week window — Grok Build, Windsurf, Claude Code Agent Teams, Codex CLI, and Devin.

The infrastructure is here. The question is whether you know how to use it.

The 3-Agent Mental Model: Spec, Implement, and Review as Separate Concerns

Think of the pipeline as three people on a small engineering team:

  • The spec agent is your technical product manager. It reads requirements, asks clarifying questions, and produces a versioned, machine-readable spec document — the ground truth every downstream agent inherits.
  • The implement agent is your engineer. It reads only the spec artifact, makes multi-file edits in an isolated git worktree, commits to a feature branch, and signals readiness.
  • The review agent is your senior reviewer. It reads the spec and the implementation, writes its findings to `feedback.md`, and cannot touch source files. It either signals a merge-ready state or triggers a fix loop.

No agent re-reads the full conversation history. No agent has write access to another agent’s domain. This is context isolation by design.

The separation matters beyond token efficiency. Each agent’s role file defines exactly what it knows, what it can do, and what success looks like. When the implement agent produces incorrect output, you don’t restart everything — you fix the spec and re-run implement. The pipeline is debuggable in a way monolithic sessions never are.

Project Setup — AGENTS.md Files, Directory Structure, and Git Worktrees

Before writing a single line of agent logic, you need the right scaffolding. Here’s the directory structure:

“`

project-root/

├── AGENTS.md # top-level coordinator instructions

├── .agents/

│ ├── spec.agent.md

│ ├── implement.agent.md

│ └── review.agent.md

├── spec/

│ └── feature-.spec.md # spec agent outputs here

├── feedback/

│ └── feature-.feedback.md # review agent outputs here

└── src/

“`

Writing AGENTS.md by hand — not by AI

This is one of the most counterintuitive rules in the whole setup: write your AGENTS.md files yourself. ETH Zurich researchers found that LLM-generated AGENTS.md files offer no benefit and can marginally reduce task success rates (~3% on average) while increasing inference costs by over 20% (Gloaguen et al., 2026). Developer-written context files, by contrast, provide a modest ~4% improvement in success rate.

Each role file should answer three questions: What is this agent’s single responsibility? What inputs does it receive? What outputs is it expected to produce?

Keep them short. A wall of instructions produces worse results than a focused half-page.

Git worktrees: isolation that actually works

Each agent gets its own working directory via git worktrees. Here are the exact commands:

“`bash

# Create isolated worktrees for implement and review agents

git worktree add ../project-implement feature/your-feature-name

git worktree add ../project-review review/your-feature-name

# Keep worktree dirs out of your repo

echo “../project-implement” >> .gitignore

echo “../project-review” >> .gitignore

“`

Name branches consistently: `feature/` for implement, `review/` for review. When the pipeline completes, merge back with a squash commit to keep history clean:

“`bash

git merge –squash feature/your-feature-name

git commit -m “feat: [pipeline: spec→implement→review]”

“`

Worktrees eliminate the most common parallel-agent failure mode: two agents writing to the same file simultaneously.

Configuring the Spec Agent — Writing the Role File and Defining the Handoff Artifact

The spec agent’s only job is to produce a document that another agent can consume without re-reading the conversation. That document is your handoff artifact — and the schema matters.

The spec.agent.md file

“`markdown

name: spec

description: Produces a versioned spec document from natural-language requirements

tools: read, write

output_path: spec/

You are a technical spec writer. Given a feature request, produce a spec document

following the schema defined in spec/SCHEMA.md. Do not implement anything.

Do not suggest implementation approaches unless they affect the spec invariants.

“`

The spec document schema

A spec artifact that downstream agents can reliably consume needs these fields:

“`markdown

# Spec: v

Summary

One sentence describing the feature.

Acceptance Criteria

  • [ ] Criteria 1 (testable, binary pass/fail)
  • [ ] Criteria 2

Edge Cases

  • Input X produces behavior Y
  • Input Z is explicitly out of scope

Invariants

  • Property that must never be violated

Files Likely Affected

  • src/module/file.ts

Out of Scope

  • What this spec explicitly does not cover

“`

The implement agent inherits only this document — not the conversation that produced it. That’s the entire architecture in one sentence.

Configuring the Implement Agent — Consuming the Spec and Working in an Isolated Worktree

The implement agent reads one file — the spec artifact — and works exclusively in its assigned worktree. Its role file should reflect that narrow scope.

The implement.agent.md file

“`markdown

name: implement

description: Implements a feature from a spec artifact in an isolated worktree

tools: read, write, edit, bash

working_directory: ../project-implement

You are a software engineer. Read the spec artifact at spec/.spec.md.

Implement exactly what the spec describes. Do not modify the spec document.

When complete, commit all changes to the feature branch and write DONE to

spec/.status.

“`

What the implement agent must not do

Constraints are as important as instructions:

  • Do not modify spec files. The spec is read-only input.
  • Do not run tests outside the worktree. Keep state fully isolated.
  • Do not signal completion until all acceptance criteria are addressed. The status file is the handoff trigger — premature DONE signals break the review stage.

The commit to `feature/` plus the status file write is the entire handoff protocol. Simple, observable, and auditable.

Configuring the Review Agent — The Judge Pattern, feedback.md, and the Fix Loop

This is where most pipeline setups fail. Developers give the review agent write access to source files — and immediately break the isolation that makes the whole pipeline trustworthy.

The judge pattern is a hard constraint: the review agent has read-only access to source files and write access only to `feedback/`. It cannot approve its own output. It cannot fix what it finds.

The review.agent.md file

“`markdown

name: review

description: Reviews an implementation against its spec; writes findings to feedback.md

tools: read, write

write_paths: [“feedback/”]

You are a senior code reviewer. Read the spec at spec/.spec.md and

the implementation diff on the feature branch. For each acceptance criterion,

write PASS or FAIL with evidence. Write your full findings to

feedback/.feedback.md. Do not modify source files.

End your report with either MERGE_READY or NEEDS_FIXES.

“`

The fix loop and termination condition

When the review agent writes `NEEDS_FIXES`, the orchestrator re-runs the implement agent with the feedback document appended to its context. To prevent infinite loops, enforce a hard iteration ceiling in implement.agent.md:

“`markdown

If feedback/.feedback.md exists, read it before implementing.

This is iteration of a maximum of 3 fix cycles.

If you cannot resolve all FAIL items within 3 iterations, write ESCALATE

to the status file instead of DONE.

“`

Three iterations are the right ceiling. Beyond that, the feedback loop is oscillating — which means the spec is ambiguous, not the implementation. Fix the spec, not the agent.

HubSpot’s Sidekick review agent, following this judge pattern, reduced time-to-first-feedback on pull requests by approximately 90% and achieved an 80% engineer approval rate (InfoQ, March 2026). The pattern works at scale.

Wiring the Orchestrator in VS Code (coordinator.agent.md + Subagent Settings)

VS Code 1.109 introduced the `chat.customAgentInSubagent.enabled` setting, which allows a coordinator agent to spawn subagents directly from the chat panel. Here’s the full configuration.

Enable subagent mode

In your VS Code `settings.json`:

“`json

{

“chat.customAgentInSubagent.enabled”: true,

“chat.agent.maxRequests”: 30

}

“`

The coordinator.agent.md pattern

“`markdown

name: coordinator

description: Orchestrates the spec→implement→review pipeline

Pipeline steps:

  1. Run spec agent with the feature request. Wait for spec/.spec.md to exist.
  2. Run implement agent. Wait for spec/.status = DONE.
  3. Run review agent. Read feedback/.feedback.md.
  4. If MERGE_READY: run `git merge –squash feature/` and close pipeline.
  5. If NEEDS_FIXES and iteration < 3: re-run implement with feedback context.
  6. If ESCALATE or iteration = 3: halt and surface the issue to the developer.

“`

The coordinator never writes code. It reads status files, triggers agents, and handles the merge. Keep its logic simple — branching complexity belongs in the individual role files, not the orchestrator.

When Not to Use Three Agents — WIP Limits, Cost Trade-offs, and the Solo-Agent Fallback

More agents are not always better. The practical ceiling for concurrent agents on a single developer laptop is 5–7 before rate limits, merge conflicts, and review bottlenecks consume the velocity gains. The recommended sweet spot is 3–5 (vibecoding.app, 2026).

Use a single agent when:

  • The feature touches fewer than three files
  • You can write the entire spec as a code comment in under 5 minutes
  • You’re prototyping throwaway code where the review signal has low value
  • Your API budget is tight — the full pipeline costs roughly 3× a single session in isolation

The cost math on a moderate feature

  • Spec agent: ~9K tokens
  • Implement agent: ~9K tokens
  • Review agent: ~9K tokens
  • Pipeline total: ~27K tokens
  • Equivalent monolithic session: ~35–45K tokens

The pipeline is cheaper and higher quality on complex features. On simple ones, the overhead isn’t worth it.

When to add a fourth agent

If your pipeline consistently surfaces security issues at the review stage, consider a dedicated security audit agent that runs static analysis checks in its own worktree before the judge review. That overhead is only justified for features touching authentication, data access, or external APIs — exactly the 48% of AI-generated code that carries potential vulnerabilities.

Build the Multi-Agent AI Coding Pipeline — Ship Better Code Every Time

The multi-agent AI coding pipeline is not a future concept. It’s a working setup you can configure in an afternoon. Three agents, each with a single responsibility, each isolated in its own worktree, each handing off a versioned artifact to the next: spec writes the ground truth, implement executes against it, and review judges without touching source. Teams running this structure ship 30–50% faster and deploy far more reliably than those still relying on one overloaded AI session to do everything.

Start with the directory structure and the AGENTS.md files. Get the spec-to-implement handoff working before you add the review agent. Once you trust the artifact format, wire in the judge pattern and the orchestrator. The full setup takes one afternoon — and the velocity gain compounds with every feature you ship after that.

Pick one feature on your current project and run it through this pipeline this week. You’ll have enough real feedback to tune your role files by Friday — and you’ll never go back to the single-agent wall.

Leave a Reply

Your email address will not be published. Required fields are marked *