AI-Friendly Code Design: 7 Patterns for 3x Agent Speed

Your AI coding assistant is already as good as it needs to be. The problem is your codebase.

That’s a hard truth for teams who’ve spent months evaluating Copilot vs. Cursor vs. Claude Code. But the evidence keeps pointing the same direction: the biggest lever on AI agent performance isn’t the model — it’s the structure of the code you’re feeding it. AI-friendly code design is now a recognized engineering discipline — Thoughtworks placed it at “Assess” in their April 2025 Technology Radar — and teams applying it systematically are reporting 2–3x speedups. Teams that don’t are watching their agents generate bloated diffs, copy deprecated patterns, and burn through tokens on context they should already have.

This post walks through seven concrete refactoring patterns that directly improve what AI agents can do with your code — and a practical workflow for prioritizing which ones to tackle first.

Why Your AI Tool Isn’t the Bottleneck (Your Codebase Is)

Here’s the finding that should change how your team thinks about AI productivity: in a 2025 randomized controlled trial, METR (Model Evaluation and Threat Research) found that AI tools increased task completion time by 19% for experienced open-source developers working on mature, complex codebases. Not decreased — increased. These were developers who predicted a 24% speedup before they started.

This isn’t an indictment of AI tools. It’s a structural diagnosis.

When an agent encounters a codebase with tangled dependencies, inconsistent naming, and scattered documentation, it doesn’t fail gracefully — it guesses. It loads massive amounts of context trying to piece together intent. It finds three versions of the same validation logic and picks one arbitrarily.

It learns from old patterns it shouldn’t be replicating at all. The model isn’t the problem. The signal quality is.

The Thoughtworks Technology Radar crystallizes this: AI coding assistants perform better with well-factored codebases, making thoughtful design essential not just for maintainability — but for AI leverage. Switching models won’t fix a structural problem.

“AI agents don’t struggle because they’re underpowered. They struggle because the codebase gives them bad inputs.”

And the stakes are growing fast. AI-authored code now makes up 26.9% of all production code as of Q1 2026, up from 22% the prior quarter (Faros.ai, analyzing ~4.2 million developers). Approximately 85% of developers use AI coding tools, with 62% relying on one daily (JetBrains, 2025). The faster agents write code, the faster a poorly structured repository amplifies every architectural weakness you’ve been living with.

Refactoring your codebase for AI agents isn’t a nice-to-have. It’s the highest-leverage investment your team can make right now.

What AI-Friendly Code Design Means — and the Threshold That Matters

“AI-friendly” doesn’t mean writing code that looks different or adding special LLM comments. It means applying the principles that make code readable for senior engineers — and applying them rigorously enough that an agent can reason about your system without needing implicit context you’ve never written down.

CodeScene’s engineering team has put a number on this. Their research establishes a Code Health score of 9.5 or higher as the threshold at which AI agents perform optimally. Below that threshold, returns are non-linear: agent output quality degrades rapidly and — critically — agents begin reinforcing bad patterns rather than improving them. The same team reported a 2–3x speedup in their own agentic workflows after their codebase crossed that threshold.

Context windows are a design constraint

Here’s the mental model that changes everything: every bloated function, every duplicated class, every piece of implicit knowledge living in a Slack thread costs tokens. An agent working on a feature doesn’t load just the file you pointed it at — it loads imports, dependencies, documentation, related tests, and anything else it needs to reason about what “correct” looks like.

Codebase structure is a form of context engineering. Every architectural decision you make either expands or compresses the useful signal available to the agent within its context window. 65% of developers already report missing context as a key problem during AI-assisted refactoring, with 26% naming improved contextual understanding as their top requested improvement for AI tools (PropelCode.ai, 2025). The fix isn’t a bigger context window — it’s a codebase that communicates intent without requiring the agent to go looking for it.

Patterns 1–3: Make Your Code Self-Describing

The first group of patterns tackles the most fundamental problem: agents that can’t understand what your code is supposed to do without reading half the codebase to reconstruct intent.

Pattern 1: Expressive, intention-revealing naming

A function named `check()` tells an agent nothing. A function named `isUserEligibleForDiscount()` communicates domain context, return type, and scope — before the agent reads a single line of the body.

This isn’t about verbosity. It’s about encoding domain knowledge directly into identifiers so agents can navigate your codebase semantically, not just syntactically. Apply this discipline to functions, variables, classes, and especially boolean flags. Names like `flag2`, `temp`, and `data` are context sinks that force agents into unnecessary traversal of the call graph.

Before: `if (check(u, cart)) applyD(cart);`

After: `if (isUserEligibleForDiscount(user, cart)) applyCartDiscount(cart);`

The refactored version gives an agent — or a new team member — everything they need to understand intent without leaving the line.

Pattern 2: Intention-revealing function signatures

Naming the function is half the job. The other half is ensuring the signature communicates its contract clearly. Prefer explicit parameter names over positional arguments. Use types that encode meaning — a `UserId` type is more informative than a bare `string`, and prevents agents from passing the wrong identifier into the wrong slot.

Avoid boolean traps. A call like `processOrder(true, false, true)` is meaningless without the definition in front of you. Replace boolean parameters with named options objects or explicit enum values.

When an agent sees a clear, typed function signature, it can reason about how to call that function correctly. When it sees opaque parameters, it guesses — and that guess often produces a subtle bug that sails through code review.

Pattern 3: Strict DRY enforcement before agent use

Duplicate code is an agent amplifier for inconsistency. When two copies of the same validation logic exist in your codebase, an agent updating one has no reliable way to know it missed the other. This isn’t theoretical — it’s one of the most common ways AI accelerates technical debt into production.

The rule is simple but requires discipline: eliminate duplication before assigning agents to a module. One canonical implementation. One place to update. One source of truth for business rules.

If you can’t deduplicate the entire codebase before your next sprint, at minimum deduplicate the modules you’re about to hand to an agent. That constraint alone will prevent a class of problems that’s otherwise expensive to unwind.

Patterns 4–5: Right-Size Your Code for the Context Window

Once your code communicates intent clearly, the next constraint is size and boundary definition. Agents working on oversized, poorly bounded code consume disproportionate tokens and produce noisier, harder-to-review output.

Pattern 4: Context-window-aware function sizing

A 200-line function is not just a readability problem — it’s a context window problem. When an agent needs to understand or modify that function, it often can’t hold the entire function plus its dependencies plus the surrounding module in working memory simultaneously. It starts reasoning on partial information. The output looks plausible but introduces edge-case failures you won’t catch until production.

The practical target: 50–80 lines per function, with a strong preference for single-responsibility design. Decompose long functions into smaller, named units. Each new function name is another piece of domain context you’re encoding for the agent — a breadcrumb trail it can follow without having to reconstruct your intent from scratch. This isn’t about hitting an arbitrary line count; it’s about ensuring the agent always operates with complete context for the unit of work in front of it.

Pattern 5: Vertical Slice Architecture for feature isolation

Layered architecture — controllers, services, repositories, all separated by technical role — scatters the complete context for any feature across multiple directories. When an agent works on a checkout flow, it needs the controller, the service, the repository layer, the DTOs, and potentially the domain model. Each lives somewhere different.

Token cost climbs. Reasoning quality drops. The diff the agent produces touches six directories and requires a specialist to review.

Vertical Slice Architecture organizes code by feature instead of technical layer. Everything the agent needs to understand and modify the checkout flow lives in `features/checkout/` — the handler, the business logic, the data access, the tests, the types. The agent works within a self-contained slice without traversing the whole codebase to assemble context.

This is one of the highest-impact structural changes a team can make for AI agent performance, and it’s almost entirely absent from mainstream AI coding advice. When the Thoughtworks Radar identifies AI-friendly code design as the discipline that matters, co-locating related context is precisely what “well-factored” means in an agentic context.

Patterns 6–7: Build Living Context Agents Can Trust

The final two patterns shift from code structure to the documented knowledge layer — the information agents need that can’t be inferred from the code itself, no matter how well-structured it is.

Pattern 6: Single-source-of-truth documentation (AGENTS.md / llms.txt)

Every project has conventions, architectural decisions, and edge-case rules that live in engineers’ heads or scattered across wikis, READMEs, old PR descriptions, and Slack threads. For a human engineer, that’s inconvenient. For an AI agent, it’s paralyzing — ambiguous handling of special cases causes agents to enter extended reasoning loops trying to resolve contradictions between what they find in different parts of the codebase.

Consolidating this knowledge into a single canonical file — commonly named `AGENTS.md` or `llms.txt` — gives agents a reliable, structured entry point for project context. Think of it as the document that replaces everything the codebase implicitly assumes the reader already knows.

The quantified payoff is hard to ignore: one documented optimization — covering documentation consolidation and explicit edge-case handling — yielded approximately 40% processing time reduction, ~75% token usage reduction, and over 80% reduction in agent circular reasoning (Aaron Gustafson, dev.to, 2025).

Your `AGENTS.md` should cover: project structure overview, current naming conventions, where business logic lives, how new features should be organized, which patterns are current vs. deprecated, and non-obvious edge-case behaviors. Treat it as a first-class architectural artifact, not an afterthought.

Pattern 7: Aggressive deprecated-pattern cleanup

This is the anti-pattern almost no one talks about — and it may be the most insidious problem on legacy codebases. Agents learn from what they see.

If your codebase contains both an old authentication pattern and a new one, the agent will replicate whichever one appears more frequently or earlier in its context window. It doesn’t know the old one is deprecated unless you explicitly tell it — and even then, if the old pattern is still present in the code, it provides a compelling template that the agent may follow anyway. The result: agents propagate deprecated patterns at machine speed, and your technical debt compounds faster than any human team could create it.

The rule: refactor aggressively and remove deprecated patterns immediately when introducing new ones. If removal isn’t possible right away, document the deprecated pattern explicitly in your `AGENTS.md` with unambiguous instructions: “Do not use `LegacyAuthHandler` — use `AuthService.verify()` instead. `LegacyAuthHandler` will be removed in Q2.” Visibility without removal is better than nothing, but removal is always the goal.

The Refactor-First Workflow: How to Prioritize What to Fix

You don’t have to refactor everything before agents become useful. You need to refactor strategically — targeting the highest-risk modules first and verifying improvement before assigning agents to them.

Here’s a sequence that works:

  1. Measure code health. Use a tool like CodeScene to score your modules against a health baseline. You’re establishing where you are relative to the 9.5 threshold and identifying your worst offenders.
  1. Identify high-churn, low-health modules. These are modules that change frequently and have poor health scores. They represent both the highest risk (agents will cause damage here) and the highest opportunity (improvements here directly impact the work your team does most often).
  1. Run a targeted refactor sprint. Apply the seven patterns to your priority modules. Don’t attempt a full codebase overhaul — focus on the two or three modules you’re planning to assign agentic tasks to in the next sprint.
  1. Re-measure and verify the threshold. Before assigning an agent to a module, confirm it has crossed the health threshold. Below it, the risk of agent-amplified debt is real and measurable.
  1. Assign agentic tasks. Now your agents are working on clean signal. Feature additions, targeted refactors, test generation — all of these perform measurably better on a healthy, well-documented module.
  1. Update AGENTS.md continuously. Every refactor sprint surfaces new conventions and decisions. Document them in real time. The living context file only works if it stays current.

This sequence is more effective than the alternative — running agents on legacy code and manually triaging their output — because it eliminates the root cause rather than managing the symptoms sprint after sprint.

What to Realistically Expect After Refactoring

The numbers are compelling, but they come with context worth understanding.

CodeScene’s engineering team documented a 2–3x speedup in agentic workflows after reaching the Code Health threshold. The documentation optimization case study showed ~75% token usage reduction. Daily AI tool users already merge approximately 60% more pull requests than non-daily users — and those gains correlate with code structure quality, not just tool usage frequency (DX Research, via Faros.ai, 2025).

These gains are real and reproducible. But they assume the codebase is your primary bottleneck.

If your PR review process, QA pipeline, or deployment workflow isn’t scaled to handle more throughput, faster agents will simply create a new bottleneck downstream. The teams that capture the full gain from AI-friendly code design treat it as a systems problem: code structure, documentation, and process need to move together.

One more honest caveat: AI-assisted code is associated with a 23.7% increase in security vulnerabilities when code review and governance processes aren’t upgraded alongside AI adoption (GetPanto.ai, 2025). Cleaner, more intention-revealing code makes reviews faster and more thorough — but structural refactoring alone doesn’t solve governance. Update your review checklists as your throughput increases.

Build the Codebase Your Agents Deserve

The teams getting 3x leverage from AI coding agents aren’t using better tools. They’re feeding their agents better inputs.

AI-friendly code design isn’t a new set of rules — it’s applying the principles you already know (clear naming, DRY, single responsibility, explicit documentation) with enough rigor that an agent can reason about your system without the implicit context that lives only in engineers’ heads. The Thoughtworks Technology Radar recognizes it. The data from CodeScene, METR, and independent case studies all converge on the same conclusion. The seven patterns here are your starting point.

Pick your two worst modules. Run through the patterns. Measure the difference before you assign a single agentic task — and then watch what your tools can actually do when the signal quality matches their capability.

Leave a Reply

Your email address will not be published. Required fields are marked *