Spec-Driven Development with GSD for Claude Code

Claude Code felt like a superpower the first time you used it. By hour three of a session, it felt like a completely different model — forgetting decisions from earlier in the conversation, hallucinating function signatures, generating code that contradicted the architecture you’d already established together.

That’s not a bug. That’s context rot. And spec-driven development with the GSD framework for Claude Code is how you fix it permanently.

This guide skips the philosophy lecture. You’ll get the exact installation commands, the full six-phase workflow walked through a real project, the one step everyone skips (and why it causes the most expensive rewrites), and a clear map of the five failure modes that cause developers to abandon GSD before it pays off.

Why Claude Gets Dumber Over Time (And Why That’s a You Problem, Not an AI Problem)

Your model hasn’t changed. Your context window has.

Claude operates at peak quality when the context window is 0–30% full. At 50%+, the model begins making tradeoffs — dropping earlier context, simplifying outputs, filling gaps with plausible-sounding guesses. At 70%+, you get hallucinated requirements, forgotten constraints, and code that works in isolation but violates the architectural decisions you made two hours ago.

This isn’t a model limitation you need to accept. It’s a workflow problem you can solve.

The standard response is to start a new chat. That fixes the context rot but destroys continuity — now you’re pasting requirements back in, re-explaining your architecture, and hoping Claude reconstructs what you built. You spend more time managing the AI than building software.

85% of developers regularly use AI tools for coding, debugging, and code review in 2026, up from roughly half that figure two years prior (AI in Software Development Statistics 2026, Modall). The bottleneck isn’t adoption. It’s workflow. And an unstructured workflow scales context rot directly with ambition — the bigger the feature, the worse the output gets before you finish it.

Vibe Coding vs. Spec-Driven Development — The One-Sentence Difference That Changes Everything

Here’s the distinction that matters:

Vibe coding gives the AI shared design authority. Spec-driven development gives the AI a known target and keeps design authority with you.

In a vibe-coded session, you describe what you want and let Claude interpret, improvise, and occasionally invent. You move fast. You also accumulate hidden debt: security issues, untested edge cases, and code that technically works but nobody fully understands — including you.

Only 29% of developers trust AI tool output as of the Stack Overflow 2025 Developer Survey, down from 70%+ in 2023. That collapse in trust tracks exactly with the rise of unstructured AI coding at scale. AI-generated code contains 2.74× more vulnerabilities than human-written code, with 45% of AI code samples failing security tests (Omniflow Blog, 2026).

Spec-driven development changes the contract. Before any code is written, you and the AI produce artifacts that define the target: what’s being built, how it should work, and what success looks like.

The AI executes against that spec. It doesn’t improvise. It doesn’t reinterpret. And when the context window fills up, a fresh agent picks up the same spec and continues with identical quality.

That’s the core promise of GSD.

Installing GSD in Under 5 Minutes — The Exact Commands and What They Create

GSD reached 23,000 GitHub stars by March 2026, making it one of the fastest-adopted spec-driven development tools in the AI coding ecosystem. Installation is straightforward.

Global install (solo developer)

“`bash

npm install -g @gsd/cli

“`

For solo use with Claude Code, you’ll run GSD with the `–dangerously-skip-permissions` flag, which allows the framework to create, modify, and commit files without per-action confirmation. This is intentional — GSD’s automation depends on frictionless file operations.

A note on that flag: `–dangerously-skip-permissions` opens a real security surface. If you’re working on a shared machine, against a codebase with sensitive credentials, or in a team environment, use scoped permissions instead. The flag is designed for controlled solo environments, not production codebases with secrets in the repo.

Local install (team project)

“`bash

npm install –save-dev @gsd/cli

“`

Commit the `.planning/` directory to your repository. The planning artifacts become version-controlled shared state — every developer on the team can see exactly where the project stands and why decisions were made.

What GSD creates

When you run `/gsd-new-project`, GSD initializes a `.planning/` directory with five files:

PROJECT.md — Project name, tech stack, core constraints
REQUIREMENTS.md — Functional and non-functional requirements
ROADMAP.md — Milestone-level breakdown of the work
STATE.md — Current milestone, task status, last checkpoint
PLAN.md — The executable task prompt for the current work unit

That last file is the one developers most consistently misuse. More on that in the failure modes section.

The Six-Phase GSD Workflow, Explained With a Real Project

Let’s walk through building a REST API with authentication. Here’s the command sequence and what happens at each step.

Phase 1: `/gsd-new-project`

Initializes `.planning/` and prompts you to define the project. You provide the name, tech stack, and a high-level description. GSD generates `PROJECT.md` and a skeleton `REQUIREMENTS.md`.

Don’t skip this even for small projects. The artifacts created here feed every subsequent phase.

Phase 2: `/gsd-discuss-phase`

This is where you and Claude work through the implementation before any planning begins. You’ll surface architecture decisions, edge cases, third-party dependencies, and anything ambiguous in the requirements. This phase gets its own section below — it’s the most consequential step in the entire workflow.

Phase 3: `/gsd-plan-phase`

Claude reads the discuss-phase output and generates `PLAN.md` — a structured, step-by-step task prompt for the current milestone. This is not documentation. It’s the executable input for the next subagent.

Keep plans scoped to a single milestone. Plans that span multiple milestones are one of the five most common GSD failure modes.

Phase 4: `/gsd-execute-phase`

A fresh subagent picks up `PLAN.md` and executes. Here’s the key insight: the subagent starts with an empty 200,000-token context window. Task 50 has identical context headroom to task 1. This is how GSD eliminates context rot at the architectural level — not by managing the existing window, but by never filling it.

Phase 5: `/gsd-verify-work`

The agent reviews completed work against `REQUIREMENTS.md` and `PLAN.md`, runs configured tests, and updates `STATE.md` with the checkpoint. This is your gate before advancing to the next milestone.

Phase 6: `/gsd-ship`

Handles final integration, cleanup, and the release artifact. For team setups, the committed `.planning/` history becomes especially valuable — it preserves the complete decision trail for future contributors.

The Most Important Step Nobody Does: /gsd-discuss-phase

If there’s one thing to take from this guide, it’s this: `/gsd-discuss-phase` is the step that prevents expensive rewrites.

Here’s why it gets skipped. Developers see it as optional pre-work — a nice-to-have before planning begins. It isn’t. It’s the step where Claude surfaces implementation gray areas before they become architectural decisions buried in 3,000 lines of generated code.

A typical discuss-phase session for an auth API surfaces questions like:

Should tokens be stored in Redis or in the database? (Different backup and failure-mode implications.)
What happens to in-flight requests during a rolling deployment?
Are there rate-limiting requirements on the auth endpoints that aren’t in the spec yet?

These aren’t questions you want answered at Phase 4. They’re questions you want answered before `PLAN.md` is written — because the plan encodes whatever assumptions Claude makes, and those assumptions persist across every subsequent subagent that touches this milestone.

One skipped discuss phase can generate a full milestone of work that needs to be partially or fully redone when the gray area surfaces during execution. Budget 10–15 minutes. It saves hours.

Run the discuss phase. Even when you think you’ve already answered all the questions. Especially then.

When to Use /gsd-quick vs. the Full Pipeline (And the Flags That Matter)

Not every task needs a six-phase workflow. `/gsd-quick` handles ad-hoc work that doesn’t warrant a full milestone cycle.

`/gsd-quick` supports four flags:

`–discuss` — Quick explore mode, no artifacts generated
`–research` — Structured research with a findings summary
`–validate` — Checks a specific approach against existing `REQUIREMENTS.md`
`–full` — Runs a complete mini-cycle (discuss → plan → execute) without touching the main planning artifacts

The rule of thumb: if the work fits in a single PR and doesn’t change the architecture, `/gsd-quick –full` is probably the right tool. If it spans multiple PRs, touches core architecture, or requires decisions you’d want documented in the permanent planning artifacts, use the full pipeline.

The trap: using `/gsd-quick` for milestone-sized work. This bypasses the planning artifacts entirely — `STATE.md` doesn’t get updated, the work isn’t checkpointed, and the next session starts with stale assumptions. This is the most common failure mode in the first week of GSD adoption, precisely because `/gsd-quick` feels lower-friction when you’re in the middle of something.

GSD v1 vs. GSD v2 — Which One Should You Actually Use?

Existing guides treat these as interchangeable. They’re not.

GSD v1 runs as slash commands inside Claude Code. It’s simpler to set up, requires no additional infrastructure, and is the right choice for individual developers running greenfield projects on their own machines.

GSD v2 is a standalone CLI built on the Pi SDK with direct context window control. It adds capabilities that matter at team and enterprise scale:

Crash recovery — If a session dies mid-execute, v2 resumes from the last STATE.md checkpoint automatically
Multi-model routing — Route different phases to different models based on capability and cost tradeoffs
CI pipeline integration — v2 runs as a non-interactive process inside GitHub Actions or similar systems
Auto-advance — Phases advance automatically without manual command invocation

For a solo developer building a side project: v1. For a team committing `.planning/` artifacts to git and running GSD-assisted CI pipelines: v2 is worth the setup overhead.

The upgrade path is clean — v1 artifacts are fully compatible with v2. You’re not starting over, you’re adding infrastructure around the same file structure.

Five Ways Developers Break GSD (And Exactly How to Fix Each One)

These are the failure modes that cause developers to abandon the framework in the first two weeks.

1. Skipping `/gsd-discuss-phase`

Already covered. The fix: make it non-negotiable before every milestone, no exceptions.

2. Plans that are too large

`PLAN.md` should represent one coherent milestone — work completable in a single focused session. When plans span multiple milestones, the executing subagent loses coherence, starts making assumptions, and produces work that’s technically complete but architecturally divergent from what comes next.

Fix: If your plan takes more than 20 tasks to describe, split it. Use `ROADMAP.md` to define the split points before you write the plan.

3. Not committing STATE.md between sessions

STATE.md is your checkpoint file. If you close Claude Code without committing it, the next session has no idea where you left off. In team setups, this is how two developers end up executing the same milestone in parallel.

Fix: Commit STATE.md as part of every session handoff. Treat it like a database transaction log — it isn’t optional state, it is the state.

4. Using `/gsd-quick` for multi-phase work

The symptom is drift: work gets done but the planning artifacts don’t reflect it, and the next milestone starts with assumptions that no longer match reality.

Fix: If a `/gsd-quick` session is expanding beyond its original scope, stop and migrate to the full pipeline. The planning overhead is lower than the rework overhead.

5. Treating PLAN.md as documentation

This one is subtle. PLAN.md is the executable prompt for the subagent. It’s written to be read by Claude, not by you. When developers edit PLAN.md to make it human-readable — adding context, softening imperatives, explaining rationale — they degrade its effectiveness as an agent prompt.

Fix: Leave PLAN.md in its generated form. Human-readable context belongs in ROADMAP.md or a separate notes file. Don’t blur the line between the executable artifact and the documentation.

The CLAUDE.md + GSD Relationship (You Need Both)

One question no existing guide answers: if you already have a CLAUDE.md in your project, do you still need GSD’s `.planning/` artifacts?

They operate at different scopes. CLAUDE.md provides project-level persistent context — your coding standards, architectural principles, and preferences Claude should always apply. It’s read at the start of every session.

Note that Claude follows CLAUDE.md instructions approximately 70% of the time (Claude Code Best Practices 2026), which is fine for style preferences but inadequate for critical rules. Those require hooks.

GSD’s `.planning/` artifacts provide task-level ephemeral context: what’s being built right now, in what order, with what constraints. They’re generated fresh for each milestone and consumed by the executing subagent.

CLAUDE.md tells Claude how you work. The `.planning/` artifacts tell Claude what it’s working on. You need both.

Start Your First Spec-Driven Development Session Today

Spec-driven development with GSD doesn’t require a greenfield project. You can retrofit it onto an existing codebase by running `/gsd-new-project` to document where things stand, then using `ROADMAP.md` to define what comes next.

Context rot doesn’t go away on its own. Every unstructured session adds to the debt. GSD gives you a system where each task starts clean, each decision is documented, and output quality doesn’t degrade as the work gets bigger.

Install GSD, run the discuss phase before you touch the plan phase, and commit STATE.md before you close the session. That’s the whole system. Everything else is refinement.