Sixty-six percent of developers say they’re spending more time fixing AI-generated code in 2025 than before they adopted the tools. That’s not a prompt problem. It’s an architecture problem.
The way your codebase is structured determines what context an AI agent receives, how much of it fills the context window with noise, and whether the agent can infer intent from your code alone. Get the codebase structure for AI coding agents right, and you cut correction time dramatically. Get it wrong, and you’ll keep doing what most teams are doing: babysitting output that’s 80% correct but 100% unmergeable.
This post walks through 7 concrete architecture patterns — with before/after examples — that change how effectively your agents work. No greenfield assumptions. These apply to legacy codebases too.
The Hidden Reason Your AI Coding Agent Keeps Getting It Wrong
It’s tempting to blame the model. The real culprit is almost always context — specifically, the wrong context arriving in the wrong shape.
When an AI agent opens a task, it scans your codebase to understand what exists and how things connect. If your codebase organizes files by technical layer (`controllers/`, `services/`, `models/`), the agent must gather fragments from four or five directories to understand a single feature. Each fragment consumes tokens. Irrelevant fragments crowd out relevant ones.
LLM output quality degrades measurably past roughly 40% of context window fill — the informal threshold observed by practitioners working with large codebases. Poorly scoped or architecturally scattered codebases hit that ceiling fast.
The result is hallucinated function names, incorrect import paths, and changes that break seemingly unrelated modules — exactly the pain points that have driven developer trust in AI accuracy down from 40% to 29% in 2025, according to the Stack Overflow Developer Survey.
The fix isn’t a better prompt. It’s a leaner, more intention-revealing structure.
Pattern 1 — Vertical Slices: Organize by Feature, Not by Layer
This is the highest-leverage change you can make to improve AI coding agent productivity.
A layered (MVC) architecture splits a feature across directories by type:
“`
src/
controllers/
cart.controller.ts
user.controller.ts
services/
cart.service.ts
user.service.ts
models/
cart.model.ts
user.model.ts
“`
An agent working on the cart checkout flow must pull from three directories. It also receives controller, service, and model files for other features — the user files add noise, consume tokens, and increase the chance of wrong inferences.
A vertical slice architecture collapses those layers into feature-owned directories:
“`
src/
features/
cart/
cart.controller.ts
cart.service.ts
cart.model.ts
cart.test.ts
user/
user.controller.ts
user.service.ts
user.model.ts
user.test.ts
“`
Every file the agent needs for a cart task is co-located. Context becomes scoped to the feature. Teams who have made this shift report a 60–80% reduction in AI context errors and 3x faster code generation for complex features, according to Cloudurable’s analysis of codebase architectures for AI tools.
Vertical slices also enforce domain boundaries. A change to the cart feature cannot silently cascade into user logic because the two are structurally separated — eliminating the class of bug where an agent’s change in one layer breaks another entirely.
If you take one thing from this post, make it this: co-locate everything an AI agent needs for a task, and make it structurally impossible to need more.
Pattern 2 — The AGENTS.md Briefing File (and Why It Must Be Human-Written)
Every major AI coding tool now supports a project-level instruction file. GitHub Copilot and OpenAI’s agents use `AGENTS.md`. Claude Code uses `CLAUDE.md`. Cursor reads `.cursorrules`.
The names differ; the purpose is identical: tell the agent the conventions, constraints, and context it cannot infer from code alone.
These files are not optional README additions. They’re first-class code artifacts that belong in the same PR as any architectural change they describe.
Here’s where a critical finding gets buried in most advice: LLM-generated AGENTS.md files actively hurt performance. ETH Zurich research published in early 2026 found that AI-generated context files reduced task success rates by an average of 3% compared to having no file at all, and inflated inference costs by more than 20%. Human-written files showed a marginal +4% success rate improvement — but only when they contained genuinely non-inferable information the agent could not derive from the code.
What belongs in a well-written `AGENTS.md`:
- What the agent must never touch — specific directories, files, or patterns
- Test patterns — which runner, what coverage expectations, how to name test files
- Import conventions — path aliases, barrel file expectations, forbidden patterns
- Architectural decisions — why a pattern exists, not just what it is
- Domain vocabulary — terms with specific meanings in your codebase
What doesn’t belong:
- Boilerplate descriptions of your frameworks (React is component-based — the agent knows this)
- Redundant restatements of what’s already obvious in the directory structure
- Anything the agent can infer by reading the code itself
The ETH Zurich finding is stark: padding your context file with inferable information doesn’t help the agent — it costs you money and degrades output quality. Write less. Write precisely.
Pattern 3 — The 500-Line File Rule Is a Token Budget Decision
You’ve heard “keep files under 500 lines” framed as a readability preference. Flip the mental model: it is a token budget constraint.
A typical 500-line TypeScript file consumes roughly 3,000–4,000 tokens. Most modern coding agents operate with context windows of 100k–200k tokens. That sounds generous until you factor in that poor data serialization and formatting overhead can consume 40–70% of available tokens unnecessarily, according to analysis from The New Stack.
A file that exceeds 500 lines creates three compounding problems for AI agents:
- It may be truncated when included in context, leaving the agent with an incomplete view of the module
- It contains multiple concerns, forcing the agent to parse the irrelevant half to find the relevant parts
- It signals poorly decomposed logic — which produces poorly decomposed agent output
Any file over 500 lines is a decomposition problem wearing a “this is complex” disguise. Split it. Each piece becomes independently includable in context without dragging in unrelated logic. File size discipline is context engineering for developers — the cheapest form of it.
Pattern 4 — Naming as Machine-Readable Documentation
Names are the cheapest form of context you can provide.
Compare:
“`
useHook3.ts
handleStuff.ts
utils.ts
“`
versus:
“`
useCartCheckoutFlow.ts
handlePaymentFailedRetry.ts
dateFormatters.ts
“`
The second set lets an agent infer intent, locate relevant files, and understand scope — without consuming a single token on additional context files. Self-documenting names are machine-readable documentation, and they scale to every AI tool simultaneously without per-tool configuration.
The naming conventions that deliver the biggest gains:
- Name files after their primary export, not their file type — avoid `helpers.ts`, `constants.ts`, `utils.ts`
- Use domain language in names — `useCartCheckoutFlow` beats `useCheckout` because it scopes the feature and the domain in three words
- Encode behavioral intent in function names — `handlePaymentFailedRetry` tells an agent what triggers it, what domain it belongs to, and what it does
- Mirror test file names — `cart.service.test.ts` next to `cart.service.ts` means an agent can predict test file locations without being told
The naming patterns in your codebase are implicitly setting the agent’s vocabulary. Make them explicit, consistent, and domain-driven.
Pattern 5 — Hierarchical Context Files for Monorepos
A single root-level `AGENTS.md` cannot serve a monorepo well. The instructions relevant to your payments package differ from those governing the analytics dashboard — and giving an agent both when it’s working in one package wastes tokens and risks conflicting guidance.
The solution is hierarchical context files: a root-level file for global conventions, with per-package overrides co-located with the code they govern.
“`
/
AGENTS.md ← global rules (security policies, forbidden patterns)
packages/
payments/
AGENTS.md ← PCI compliance rules, retry logic constraints
src/
analytics/
AGENTS.md ← event naming conventions, schema constraints
src/
“`
Agents that support hierarchical context — Claude Code, Cursor, and GitHub Copilot all do — merge these files with the closest-scope file taking precedence. Your payments team can encode PCI-relevant constraints locally without cluttering the global manifest with information irrelevant to every other package.
This isn’t theoretical. The OpenAI repository uses 88 nested AGENTS.md files across its monorepo, each providing scoped, precise context to agents working within that package. That’s the production benchmark for hierarchical context at scale.
For teams just starting, even two levels — global and per-package — is a meaningful improvement over a single sprawling manifest.
Pattern 6 — Safe Zones vs. Ask-First Zones: Directory-Level Guardrails
Not all directories carry equal risk. An AI agent autonomously editing a React component is low-stakes. An agent autonomously editing infrastructure-as-code, secrets management, or database migration files is potentially catastrophic — and agents cannot always distinguish these without explicit guidance.
The safe zone / ask-first zone pattern makes risk explicit in both your directory structure and your `AGENTS.md`.
Safe zones (agent can modify autonomously):
“`
src/features/ ← product feature code
src/components/ ← UI components
src/utils/ ← utility functions
tests/ ← test files
“`
Ask-first zones (agent must confirm before touching):
“`
infra/ ← infrastructure as code
migrations/ ← database migrations
config/ ← environment configuration
secrets/ ← credentials (never)
vendor/ ← third-party code
“`
In your `AGENTS.md`, make this explicit:
“`
Safe Zones
You may create, edit, and delete files in /src/features, /src/components,
/src/utils, and /tests without confirmation.
Ask-First Zones
Before modifying anything in /infra, /migrations, /config, or /vendor,
describe your intended change and wait for explicit approval.
“`
This single pattern prevents the class of agent mistake that takes days to undo: a migration that runs in production, a config change that breaks a deployment pipeline, infrastructure drift that only surfaces at 2am. The cost of defining these zones is an hour. The cost of skipping them is open-ended.
Pattern 7 — Spec-First Development with PRPs to Prime Agent Intent
The most common reason an agent writes technically correct but architecturally wrong code is that it doesn’t know why a feature exists before it writes it. It infers intent from what already exists — which biases it toward replicating the past rather than building what you actually need.
Spec-first development solves this with a Product Requirements Package (PRP): a structured document written before the agent touches any code. A PRP contains:
- The user problem being solved and the intended behavior
- Acceptance criteria expressed as testable conditions
- Constraints — what the solution must explicitly not do
- Pointers to relevant existing code the agent should treat as reference
When an agent receives a PRP before being pointed at the codebase, it builds a model of intent first, then maps that intent to your structure — rather than reverse-engineering intent from structure alone.
The workflow in practice:
- Write a short PRP (200–400 words) in a `specs/` directory
- Reference it explicitly in your agent prompt: “Before writing any code, read `specs/cart-checkout-v2.md`”
- Review the agent’s interpretation before it writes a single file
This adds 10–15 minutes at the start of a task and routinely saves hours of correction. It’s the architectural equivalent of a good design review — except you’re reviewing intent before implementation, not after.
Applying These Patterns to a Legacy Codebase Without a Full Rewrite
Most codebases aren’t greenfield. You can’t restructure ten years of layered architecture in a sprint. The migration path matters as much as the destination.
Here’s a sequenced approach that works:
Week 1: Write your AGENTS.md (or CLAUDE.md).
This is zero-disruption and delivers immediate value. Document what the agent should never touch, your naming conventions, test patterns, and any architectural decisions that aren’t obvious from the code.
Keep it under 300 words. Make it non-inferable. Commit it to the repo and enforce it in PRs.
Weeks 2–3: Extract one vertical slice as a proof of concept.
Pick a bounded, actively developed feature. Move its files into a `features/[name]/` directory. Keep everything else in place. Measure agent output quality on that feature versus unchanged areas.
Immediately: Define safe and ask-first zones.
Add the two-section guardrail to your AGENTS.md now. This costs nothing and prevents the most costly mistakes — there’s no reason to defer it.
Quarter by quarter: Migrate features as they’re touched.
Whenever a feature enters normal development, migrate it to a vertical slice. Within two or three quarters, most actively maintained areas will be restructured without a dedicated restructuring sprint.
The goal isn’t architectural perfection. It’s progressive reduction of the gap between what your codebase communicates to an agent and what you actually intend it to build.
Conclusion: Codebase Structure for AI Coding Agents
The reason AI coding agents underperform isn’t the model — it’s the codebase structure they’re navigating. With 90% of engineering teams reporting AI usage by late 2025 and 41% of all code estimated to be AI-assisted, the structural debt in your codebase compounds with every agent interaction.
These seven patterns — vertical slices, precise `AGENTS.md` files, file size discipline, self-documenting names, hierarchical context for monorepos, explicit safe zones, and spec-first development — directly address the structural causes of inconsistent AI coding agent output. None of them require a full rewrite. All of them start paying back within the week you implement them.
Pick one pattern. Apply it this week. Measure the difference before adding the next. Architecture changes that compound are more durable than wholesale rewrites — and far less likely to be abandoned halfway through.
If you’re unsure where to start, write your `AGENTS.md` today. It takes an hour, requires no refactoring, and gives every AI tool on your team a shared source of truth about how your codebase works.