Gemini CLI Subagents: Build Custom Coding Agents

You’ve been there — 40 turns into a Gemini CLI session and something starts to slip. The model references a variable you renamed three files ago. It repeats a suggestion it already tried. The code it generates is technically plausible but wrong for your specific codebase.

This is context rot, and it’s not a bug — it’s a physics problem. Every tool call, file read, and error message occupies tokens, and once the window fills up, older instructions get compressed or pushed out entirely.

Gemini CLI subagents, launched April 15, 2026, attack this problem at the root. Instead of one overloaded session handling everything, you get isolated specialists — each with their own fresh 1M-token window — executing in parallel while your main agent stays focused. Here’s how to build them, and when to use each one.

The Context Rot Problem (Why Your AI Coding Sessions Degrade After 20 Turns)

You ask the AI to refactor a module, search six files, fix a test, and update a config. By turn 25, it starts hallucinating. Not randomly — it misremembers things from early in the session.

A function renamed from getUserData to fetchUser becomes getUser in the AI’s next suggestion. A path you explicitly corrected gets reverted.

The cause is mechanical. Each turn consumes tokens: the system prompt, your messages, all model responses, every file read, every grep result. Gemini 2.5 Pro’s 1M-token context is large, but in a heavy coding session with dozens of tool calls, you can fill it faster than you think.

What makes this dangerous is that the degradation is invisible. The model doesn’t announce “I’ve lost track of your variable names.” It just starts making errors that look like normal AI mistakes. By the time you notice something is wrong, you’ve already accepted two or three corrupted suggestions.

This is the exact problem gemini cli subagents were built to solve.

How Gemini CLI Subagents Work — The Hub-and-Spoke Architecture Explained

The mental model is simple: your main Gemini CLI session is the Hub. Each subagent is a Spoke.

When you — or the model automatically — delegate a task to a subagent, the framework spins up a completely isolated context. That subagent gets its own fresh 1M-token window, its own tool permissions, its own model configuration, and runs independently of the main session. When it finishes, it returns a single consolidated result back to the Hub. The intermediate work — the 20 file reads, 5 greps, 3 failed attempts — never touches your main context.

This isn’t just memory management. It enforces a clean interface: subagents communicate through their output, not through shared state. A subagent can’t accidentally modify a global variable in your main session because there’s no shared global state to modify.

If you’re thinking about how this maps to multi-model stacks in production, the parallel is direct — isolation is what makes composition tractable at scale.

One constraint the framework enforces unconditionally: subagents cannot delegate to other subagents. Even if you grant a subagent the * wildcard for tools, recursion is blocked at the framework level. This prevents runaway token spend and infinite delegation loops, and it keeps your architecture flat and predictable.

The Four Built-in Subagents and When to Use Each One

Gemini CLI ships with four pre-configured agents. No setup required — they’re available immediately via @agent-name syntax or automatic routing.

codebase_investigator — Your go-to for research tasks inside a repo. Optimized for traversal: reading files, running greps, analyzing structure. Point it at an unfamiliar module and ask it to explain the data flow. Default: 30 turns, 10-minute timeout.

cli_help — Answers questions about Gemini CLI itself. Faster than checking docs for flag syntax or new feature details.

generalist — A broad delegator. Useful when your task doesn’t fit the other categories, or when you want to isolate context from a sprawling multi-step task without a specialist.

browser_agent — Fetches and interprets live web content. Requires Chrome 144+ and is disabled by default. On macOS, the seatbelt sandbox forces isolated + headless mode. In Docker, you must set sessionMode: 'existing' — the default isolated mode won’t work in container environments. One more gotcha: browser_agent requires API key authentication; Google Sign-In is incompatible.

All four share the same defaults: max_turns=30, timeout_mins=10, temperature=1, model=inherit. You can override any of these in settings.json — for example, running codebase_investigator on gemini-2.5-flash with 50 turns for faster, deeper traversals on large repos.

Building Your First Custom Subagent — A Complete Agent Definition File Walkthrough

Custom agents live in Markdown files with YAML frontmatter. Two deployment locations are supported:

.gemini/agents/ — project-scoped, checked into version control, shared with the team
~/.gemini/agents/ — user-scoped, personal agents available across all your projects

A working agent file looks like this:

---
name: readme_architect
description: Generates or rewrites README files by analyzing the project structure, key entry points, and existing documentation. Use this when asked to document a project or update its README.
model: gemini-2.5-pro
temperature: 0.4
max_turns: 20
timeout_mins: 8
tools:
  - read_file
  - glob
  - grep_search
---

You are a technical documentation specialist. Your job is to produce clear, accurate README files.

When given a project, you will:
1. Explore the directory structure
2. Identify the main entry points and key modules
3. Check for existing docs or inline comments
4. Produce a complete README in Markdown

Focus on accuracy over style. Never invent features that aren't in the code.

Three things are easy to get wrong:

The description field does double duty. It surfaces in /agents output so you know what’s registered — but more importantly, it controls automatic routing. When you make a request to the main agent, Gemini CLI reads your description to decide whether to delegate automatically. A vague description like “helps with documentation” will never get routed to. Write descriptions that match the natural language a developer would actually use: “Generates or rewrites README files by analyzing the project structure…”

The tools array is a whitelist, not a filter. Omit it entirely and the agent inherits all available tools — including write access. Always declare tools explicitly for any agent deployed to a shared project.

mcpServers defined in frontmatter are exclusive to that agent. They don’t appear in the global MCP registry. If your team expects a shared MCP server to be available to all agents, define it globally — not inside a subagent’s frontmatter. This is the most common footgun for teams setting up a collaborative multi-agent workflow.

Tool Scoping and the Principle of Least Privilege for AI Agents

Every tool your agent doesn’t need is a liability. An agent that can write files but only needs to read them can corrupt your codebase on a bad hallucination.

Gemini CLI’s tools array supports wildcard syntax:

tools:
  - read_file        # exact tool name
  - grep_search
  - glob
  - mcp_*            # all tools from all MCP servers
  - mcp_github_*     # all tools from the 'github' MCP server only

For read-only research agents — the most common type you’ll build — limit to read_file, glob, and grep_search. That’s it. No write_file, no run_terminal_cmd, no create_file.

For agents that need to modify code, scope by directory when your tool set supports it, and set explicit timeout_mins to cap potential runaway execution.

The TOML policy engine adds a second enforcement layer. Using the [[rules]] + subagent pattern, you can deny specific tools per subagent at the security-policy layer — separate from the agent’s own frontmatter. This matters in team environments where agent files come from multiple contributors: you can allow a tool globally while blocking it for a specific agent at a level that can’t be overridden by editing the agent file itself.

Parallel Delegation Patterns — Fanning Out Safely (and When Not To)

The real power of the hub-and-spoke model is fan-out: assigning 3–5 subagents to work simultaneously on independent tasks. Read-only research tasks are almost always safe to parallelize.

Safe to parallelize:
– Analyzing three separate modules for security vulnerabilities
– Generating tests for non-overlapping files
– Researching external API docs while auditing internal code
– Summarizing multiple log files from different services

Not safe to parallelize:
– Two agents writing to the same file
– Agents that both need to run and commit migrations
– Any scenario where Agent A’s output is Agent B’s input

The race condition risk is concrete. If you spin up two agents that both modify src/config.ts, the second one to write will overwrite the first one’s changes — silently, with no conflict detection. For teams running parallel write agents, git worktrees provide the isolation layer that prevents these conflicts at the branch level.

A reliable rule: if the tasks would need to be sequential in a single session, treat them as sequential with subagents too. The exception is when you’re certain the tasks operate on completely separate file paths with no shared output.

Three Real-World Subagent Recipes

README architect

File: .gemini/agents/readme_architect.md
Tools: read_file, glob, grep_search
Temperature: 0.4

The key insight here: give it explicit instructions to never invent features. Documentation agents hallucinate capabilities more than code agents do because they’re working with less structure. The system prompt should emphasize analyzing what exists, describing it accurately, and flagging gaps rather than filling them with assumptions.

Security auditor

File: .gemini/agents/security_auditor.md
Tools: read_file, glob, grep_search (no write access — intentionally)

This agent scans for hardcoded secrets, unvalidated inputs, and unsafe dependency patterns without touching any files. Keep it read-only by design. An audit agent that can also modify files creates a perverse incentive: it starts “fixing” what it flags rather than just reporting. Separation of concerns applies to AI agents exactly as it applies to services.

Test generator

File: .gemini/agents/test_generator.md
Tools: read_file, glob, grep_search, write_file

Set temperature: 0.2 — lower temperature produces more predictable test structure. This one can be parallelized across modules, but only if each module writes tests to its own dedicated file. The moment two test generator instances target the same __tests__/ directory with shared fixtures, you risk interleaved writes. Structure your test directories by module before running agents in parallel.

Gemini CLI Subagents vs. Claude Code Subagents — What’s Different and What to Pick

Both ecosystems support subagents, and the surface similarities are real. The differences matter for how you architect solutions.

File format: Both use Markdown + YAML frontmatter. The supported fields differ. Gemini’s timeout_mins and per-agent mcpServers in frontmatter have no direct Claude Code equivalent at the agent-definition level.

Context window strategy: Each Gemini subagent gets a full 1M-token window — enough to hold approximately 3–4 million characters of code without chunking. That’s the largest isolated context window in any mainstream AI coding CLI. Understanding how Claude Code subagents handle context inheritance shows a different architectural trade-off: tighter integration with the parent session versus Gemini’s hard isolation.

Recursion rules: Both platforms block agent-to-agent recursion. Gemini CLI enforces it at the framework level regardless of tool grants — you can’t accidentally enable it.

Rate limiting gotcha: During heavy subagent sessions on Gemini CLI’s free tier (60 requests/minute), hitting the per-minute cap will silently switch the model to Flash. The status bar still shows the original model name. If output quality drops suddenly mid-session, this is the likely cause — not a regression in your agent’s system prompt.

Which should you use? If you’re already in the Gemini ecosystem and working with large codebases where 1M-token isolation per agent matters, Gemini CLI subagents are the natural fit. If your team is standardized on Claude Code, the switching cost rarely justifies the move for subagent access alone.

Start Small, Then Fan Out

Context rot is a mechanical problem — not a limitation of AI intelligence but of how token-based memory works under heavy use. Gemini CLI subagents give you a structural fix: isolated context windows, parallel execution, and explicit tool scoping that keeps each agent doing exactly what it should.

Start with the built-in agents to build intuition for the delegation model. Then write your first custom agent for the task you run most often — the YAML frontmatter is shallow enough that you can ship a working definition in 15 minutes. The discipline is in the tool scoping: give each agent only what it needs, and your sessions will stay sharp through work that would have collapsed a single-context session by turn 30.

Try this now: create a read-only codebase_investigator clone scoped to a single module in your current project. Invoke it with @your-agent-name analyze this module for dead code. You’ll understand isolated context the first time it returns a clean report while your main session hasn’t moved an inch.

The Context Rot Problem (Why Your AI Coding Sessions Degrade After 20 Turns)

How Gemini CLI Subagents Work — The Hub-and-Spoke Architecture Explained

The Four Built-in Subagents and When to Use Each One

Building Your First Custom Subagent — A Complete Agent Definition File Walkthrough

Tool Scoping and the Principle of Least Privilege for AI Agents

Parallel Delegation Patterns — Fanning Out Safely (and When Not To)

Three Real-World Subagent Recipes

README architect

Security auditor

Test generator

Gemini CLI Subagents vs. Claude Code Subagents — What’s Different and What to Pick

Start Small, Then Fan Out

Leave a Reply Cancel reply

Related Posts

QA Engineers Are Not Being Replaced — They’re Being Promoted: The Human Side of AI Testing in 2026

The Dark Side of the 3-Person AI Startup: Burnout, Brittleness, and the Risks No One Talks About

4 Context Engineering Patterns for Reliable AI Agents

How to Build AI-Generated Code Quality Gates in CI/CD