Cursor 3 Background Agents: The Config Playbook

Most teams discover Cursor 3 background agents the same way: one engineer hits Ctrl+E, watches a PR appear an hour later, and immediately Slacks the whole team. Then two agents start editing the same file. Someone burns $30 in credits on a task that needed a five-minute clarification. The setup lives in that one engineer’s head and nowhere else.

This guide skips that learning curve. You’ll get the exact three-file configuration that keeps cursor 3 background agents in their lanes, the prompt pattern that prevents surprise rewrites, and a PR review checklist for catching what agents consistently miss — whether you’re running one agent or five.

What Cursor 3 Actually Changed (And Why the Agent-First Shift Is Real)

Cursor 3 didn’t just ship background agents — it signaled a full product pivot. According to the Cursor blog, agent users now outnumber Tab autocomplete users 2-to-1, a ratio that was the reverse just one year earlier. Their own engineering team runs 35% of merged pull requests entirely through autonomous cloud agents.

The technical backbone is Composer 2, the model powering Cursor 3 agents. It scores 61.3 on CursorBench versus 44.2 for Composer 1.5 — roughly a 39% improvement — and runs at over 200 tokens per second. That speed matters when you’re running multiple agents in parallel and paying by the token.

What changed in the UI is the Agents Window: a dedicated pane showing every running agent, its current status, and a link to the branch it’s working on. You can hand off a long-running task to Cursor’s cloud infrastructure and close your laptop. The agent keeps running in a hosted VM; you come back to a PR.

The catch is that many engineers see the demo, fire off an agent, and immediately run into configuration problems. The config layer is what separates “fun demo” from a repeatable production workflow.

The Three Config Files Every Background Agent Reads — and What Belongs in Each

Three files govern what an agent knows and how it behaves. Getting any one of them wrong is the most common cause of agents going off-script.

`.cursor/rules/` — Always-On Context

Most documentation still treats this as a single file. It’s a folder. Each .mdc file inside it loads automatically for every agent in the project, and you can scope different rule files to specific paths.

A practical structure for a Node + Postgres stack:

.cursor/rules/
  project.mdc       # repo-wide conventions: TypeScript strict, test file naming, lint config
  database.mdc      # Postgres patterns: migration format, pool config, query conventions
  api.mdc           # Express route structure, error handling pattern, auth middleware

Keep rules behavioral, not informational. “Always add database indexes to migration files” is a rule. “Our database is Postgres” is background information that belongs in SKILL.md.

`.cursor/environment.json` — VM Setup

This file tells the cloud VM what to install before the agent touches a single line of code. If this file isn’t committed to your repo, every agent starts from a blank slate.

A real example for Node + Postgres:

{
  "install": "npm ci",
  "start": "docker-compose up -d db && npm run dev",
  "terminal": {
    "env": {
      "NODE_ENV": "test",
      "DATABASE_URL": "postgresql://localhost:5432/myapp_test"
    }
  },
  "ports": [3000, 5432]
}

The fields that matter:
– install — runs once at VM startup. Use npm ci over npm install for determinism.
– start — brings up any services the agent needs running before it writes code.
– terminal.env — environment variables injected into every command. Use test credentials here, never production secrets.
– ports — exposes these locally so the agent can make HTTP calls against a running server during testing.

One important hazard: a cloud agent works from a snapshot of your repo at launch time. If you keep editing files locally while the agent runs, you’re creating a divergence you’ll need to resolve. environment.json can’t help you there — that requires worktree isolation, covered next.

`SKILL.md` — Dynamically Loaded Domain Knowledge

This file is almost entirely absent from existing Cursor guides, which is strange because it’s the mechanism that keeps context windows clean.

SKILL.md is a markdown file your .cursor/rules can reference dynamically — you load it only for agents that need domain-specific knowledge. A payments feature agent loads your Stripe integration patterns. A data pipeline agent loads your ETL conventions. Nothing bleeds into contexts where it’s irrelevant.

Keep SKILL.md files short and opinionated. “Our Stripe webhooks use idempotency keys — always check for duplicate event IDs before processing” is useful. A 400-line architecture overview is noise.

Isolating Agents with Git Worktrees So They Never Collide

Two agents editing the same file will produce a merge conflict. The solution isn’t careful task planning — it’s isolating the filesystem.

Running parallel agents with git worktrees gives each agent its own working directory and branch, so there’s no shared state. Agent A modifies src/auth/middleware.ts in its worktree; Agent B modifies the same file in its worktree. They never see each other’s changes until you explicitly merge.

The setup:

git worktree add ../project-agent-auth feature/agent-auth
git worktree add ../project-agent-billing feature/agent-billing

Point each agent at its worktree directory when you launch it. The agents run on completely separate branch checkouts with no filesystem overlap.

The remaining risk is scope creep: an agent tasked with auth might decide it needs to refactor a shared utility, touching files outside the intended boundary. Worktrees solve the infrastructure problem but don’t replace clear task scoping in your prompts. The two work together.

Writing Prompts That Force a Plan Before Any Code Moves

The most expensive thing a background agent can do is start coding based on a misunderstood requirement. Agents are good at executing plans — they’re not reliable at detecting when they’ve misunderstood one.

The pattern that prevents this is Clarify → Plan → Approve → Execute.

Clarify phase — prompt the agent before it writes anything:

“Before writing any code, list every assumption you’re making about the existing schema, auth pattern, and test coverage. Ask me to confirm them.”

Plan phase — the agent responds with a list of files it plans to touch, the sequence of changes, and any migration or environment changes required. This is your checkpoint.

Approve phase — you reply explicitly:

“Confirmed. Proceed.” — or — “Don’t touch src/lib/db.ts. Use the existing query() wrapper.”

Execute phase — the agent now has an explicit contract. Deviations from it will be visible when you diff the PR.

Skipping the approval step is where most cost overruns happen. A single agent-generated PR can consume $4–5 in credits on Max mode-compatible models. If the agent spent most of that executing the wrong plan, you’ve paid to create more work for yourself.

A prompt template worth committing to your repo:

Task: [description]
Constraints:
- Files in scope: [list them explicitly]
- Files out of scope: [explicit exclusions]
- Tests required: yes/no, which framework
- Do not proceed past the plan phase without explicit approval.

Launching, Monitoring, and Handing Off to the Cloud

Launching a background agent is Ctrl+E (or Cmd+E on Mac). The agent appears immediately in the Agents Window pane, showing its current action, elapsed time, and the branch it’s writing to.

The UI looks simple, but a few behaviors aren’t obvious until you’ve been burned by them:

Status labels are coarse. “Running” means the agent is doing something — not necessarily the right thing. Check the branch diff early if a task has been running more than 20 minutes without a visible commit.
Cloud handoff isn’t magic. It requires the agent to detect a long-running task and migrate itself. Don’t force-quit the app and expect the task to continue — close it gracefully.
Parallel agents each need their own launch. There’s no broadcast-to-all mechanism. You start them individually with separate prompts and separate Agents Window entries.

The pane layout that works well for monitoring multiple agents: editor on the left, Agents Window in the center, terminal on the right showing the target branch’s git log. You can watch PRs materialize without leaving the editor.

Best-of-N Comparison: Running the Same Task Across Multiple Agents in Parallel

For complex or high-stakes tasks, the right approach isn’t one agent — it’s several. Run the same task prompt against two or three agents in separate worktrees, let them all finish, then compare outputs.

This is the most underused capability in Cursor 3, and it maps cleanly to how multi-model stacks behave in production — where model diversity surfaces better solutions, not just faster ones.

Evaluate the outputs on:
– Which has more complete error handling?
– Which writes tests that exercise edge cases, not just the happy path?
– Which diff is smaller and more focused on the stated task?

Merge the winner. Delete the other branches. Running two agents on the same task roughly doubles the credit cost, but shipping the wrong implementation costs more.

What Cursor 3 Background Agents Can’t Do (Yet) — Hard Limits to Set With Your Team

These constraints belong in your team wiki before anyone runs an agent on production work.

GitHub only. Background agents connect exclusively to GitHub repositories. Bridging Cursor with non-GitHub repos is possible for local agent runs, but cloud handoff won’t be available — meaning long-running tasks won’t survive a closed laptop on GitLab or Bitbucket.

No external trigger API. You can’t kick off an agent from a CI webhook, a Slack command, or a script. Every agent run starts from inside the Cursor UI. If your workflow needs “failing test → auto-assign agent,” that’s not available yet.

Credits don’t pool. Each developer draws from their own credit allocation. There’s no team credit pool on the Pro plan. Enterprise customers gained self-hosted cloud agents on March 25, 2026, which is the only option that gives infrastructure-level cost control.

Privacy mode trade-offs. Enabling privacy mode means your code doesn’t leave your machine for training purposes — but it also disables some cloud-based agent capabilities. Pick your trade-off deliberately; don’t assume privacy mode is compatible with the full background agent feature set.

Reviewing Agent PRs Without Getting Burned — A Checklist

Agent-generated code has consistent failure modes. Once you know what to look for, reviews get significantly faster.

Happy-path bias is the most common pattern. Agents write code that works when inputs are valid and services are available. Check specifically for:
– Empty state handling — what happens when the database returns zero rows?
– Network failure cases — is there a retry, or does the function throw and let the caller deal with it?
– Input validation on any data coming from outside the service boundary

Tests that mirror implementation gaps. Agents write tests that reflect their own code. If the implementation skips an edge case, the tests will too. Don’t just run the test suite — read the tests and ask whether they’d catch the failure modes that matter in production.

Security implications of browser-enabled agents. If the agent had web access during the task, scan any external URLs it fetched and any packages added to package.json. Prompt injection via a malicious webpage is a documented risk class for browser-capable agents.

Secrets in environment.json. Run this as a pre-commit check. The committed file should only ever contain test-environment values — never real API keys, database passwords, or tokens.

Scope creep in the diff. Check files the agent wasn’t supposed to touch. An “auth change” that includes a refactor of the database connection pool is a warning sign that either the task scoping was too loose, or the agent made a judgment call you didn’t approve.

The Gap Between Demo and Production Is Two Hours of Config Work

Cursor 3 background agents are a force multiplier on whatever infrastructure surrounds them. Commit the config files, isolate agents with worktrees, enforce the plan-then-approve pattern, and review PRs with the checklist above. Once those habits are in place, the compounding value is real — and the 35% of PRs that Cursor’s own team ships through agents gives you a sense of what the ceiling looks like.

Start with one agent on a well-scoped task. Get that workflow clean before running five in parallel. Then document the setup in your repo so the next engineer doesn’t have to figure it out from scratch.

If you’re still running cursor 3 background agents one at a time on an ad hoc basis, the gap between that and a working parallel workflow is smaller than it looks. The config layer is maybe two hours of work. That’s where the leverage is.

What Cursor 3 Actually Changed (And Why the Agent-First Shift Is Real)

The Three Config Files Every Background Agent Reads — and What Belongs in Each

.cursor/rules/ — Always-On Context

.cursor/environment.json — VM Setup

SKILL.md — Dynamically Loaded Domain Knowledge

Isolating Agents with Git Worktrees So They Never Collide

Writing Prompts That Force a Plan Before Any Code Moves

Launching, Monitoring, and Handing Off to the Cloud

Best-of-N Comparison: Running the Same Task Across Multiple Agents in Parallel

What Cursor 3 Background Agents Can’t Do (Yet) — Hard Limits to Set With Your Team

Reviewing Agent PRs Without Getting Burned — A Checklist

The Gap Between Demo and Production Is Two Hours of Config Work

Leave a Reply Cancel reply

Related Posts

Two Worlds of AI: Why Prompt Engineering Is Dead for Users but Critical for Builders

LLM Structured Outputs: OpenAI, Anthropic & Gemini

Context Engineering for AI Coding Agents: Rules That Work

Production MCP Server: Auth, Errors & Deployment

`.cursor/rules/` — Always-On Context

`.cursor/environment.json` — VM Setup

`SKILL.md` — Dynamically Loaded Domain Knowledge