Your pull requests are sitting. A developer opened one three hours ago; it’s ready for review, and your senior engineers are either blocked on other work or haven’t seen the notification. This bottleneck is one of the most persistent friction points in software delivery — and it’s exactly the problem GitHub Agentic Workflows were built to solve.
GitHub agentic workflows code review changes the equation: instead of waiting for a human to take the first pass at a PR diff, an AI agent reads the changes, applies judgment, and posts a structured review within seconds of the PR opening. This guide walks you through everything the official quickstart skips — a full CLI setup, a production-ready Markdown workflow file, the safe-outputs security model you must understand before running any agent in your repo, and custom quality gates that go well beyond generic “review this PR” prompts.
Why Standard GitHub Actions Can’t Review Code the Way an AI Agent Can
If you’ve used GitHub Actions for any length of time, you already know its superpower: composable, event-driven automation built from YAML. But YAML automation has a ceiling.
Standard Actions workflows are imperative. You script every step. You can run a linter, check test coverage thresholds, or call an AI API endpoint — but the logic for what to do with the response lives entirely in your shell commands and conditional expressions. There’s no concept of contextual judgment. The workflow can’t read a 400-line diff and decide that a new database migration is missing a rollback strategy.
That’s not a tooling gap you can close by adding another step. It’s a fundamental constraint of scripted automation: it does exactly what you told it to do.
GitHub Agentic Workflows take a different approach. Instead of scripting steps, you describe intent in plain language. The AI agent reads your instructions, reads the PR, and uses judgment to produce output. It can reason across multiple files, connect a schema change to a downstream API handler, notice that a security-sensitive path was modified, or flag that a dependency bump skips a major version — none of which requires you to write the detection logic yourself.
According to Qodo’s State of AI Code Quality report, AI-coauthored pull requests show approximately 1.7× more issues on average compared to non-AI-assisted code — making automated review an essential quality gate, not a nice-to-have.
The trade-off is control. With Actions, you know exactly what runs. With Agentic Workflows, the agent is making decisions you haven’t explicitly scripted. That’s why the safe-outputs model exists — and why you should understand it before writing a single line of workflow Markdown.
GitHub Agentic Workflows in 90 Seconds: Architecture, Safety Model, and Supported AI Engines
GitHub Agentic Workflows entered technical preview on February 13, 2026, developed jointly by GitHub Next, Microsoft Research, and Azure Core Upstream. They run inside the same GitHub Actions infrastructure you already use — same runners, same event triggers, same secrets store — but the execution model is fundamentally different.
Instead of a YAML workflow file, you write a Markdown file with a YAML frontmatter block. The frontmatter declares what the agent is allowed to do. The Markdown body tells the agent how to think about doing it. You then run `gh aw compile` to produce a `.lock.yml` artifact, which is the actual Actions workflow that gets executed.
The safe-outputs security model
This is the concept almost every tutorial skips — and it’s the most important thing to understand before you commit any agentic workflow to a production repo.
The `safe-outputs` field in the frontmatter defines a compile-time allowlist of the operations the agent can perform. Each entry specifies an action type (like `add-comment` or `add-review`) and a `max` value that caps how many times that operation can run per workflow execution.
“`yaml
safe-outputs:
- type: add-comment
max: 1
- type: add-review
max: 1
“`
This is not a runtime check. The `max: 1` constraint is baked into the compiled `.lock.yml` — the agent architecture physically cannot exceed it, regardless of what the AI decides to do. The second guardrail is permission-scoped jobs: the agent runs in a separate job with only the permissions declared in the frontmatter. It cannot inherit broad repository write access from the calling workflow, which means a misbehaving agent can’t quietly push code or modify repository settings.
Supported AI engines
Three engines are available today:
- GitHub Copilot CLI — the default, no extra subscription required
- Claude Code — available to all Copilot Business and Pro users at no additional cost (added February 26, 2026)
- OpenAI Codex — same availability as Claude Code
You swap engines by changing a single `engine` field in the frontmatter. More on which to pick later.
Prerequisites and Setup: CLI Extension, Fine-Grained PAT, and gh aw init
Before writing a single line of Markdown, you need three things in place.
1. Install the gh-aw CLI extension
“`bash
gh extension install github/gh-aw
gh aw –version
“`
2. Create a fine-grained Personal Access Token
Go to Settings → Developer Settings → Personal access tokens → Fine-grained tokens and create a new token scoped to your target repository. The only required permission is Copilot Requests (read and write). No other permissions are needed for a read-and-comment workflow. Copy the token immediately — you won’t see it again.
3. Initialize the workflow
From the root of your repository:
“`bash
gh aw init
“`
This creates a `.github/agentic-workflows/` directory and scaffolds a starter workflow file. Set your PAT as a repository secret named `GH_AW_TOKEN`, then authenticate your chosen engine:
“`bash
# For Copilot (default)
gh aw auth –engine copilot
# For Claude Code
gh aw auth –engine claude
“`
Anatomy of an Agentic Workflow Markdown File
Every agentic workflow lives at `.github/agentic-workflows/
Here’s a fully annotated example:
“`markdown
—
# When this workflow runs
on:
pull_request:
types: [opened, synchronize, ready_for_review]
# GitHub token permissions granted to the agent job
permissions:
pull-requests: write
contents: read
# Which tools the agent may call
tools:
- read_file
- list_files
- get_diff
# Which AI engine powers this workflow
engine: claude # or: copilot, codex
# The compile-time allowlist of permitted output operations
safe-outputs:
- type: add-review
max: 1
- type: add-comment
max: 2
—
PR Code Review Agent
You are a senior software engineer reviewing a pull request…
“`
Frontmatter Field Reference
| Field | Purpose |
|—|—|
| `on` | Standard GitHub Actions event triggers — identical syntax to YAML workflows |
| `permissions` | Minimal permissions for the agent job; anything not listed is denied |
| `tools` | Functions the agent may call to read repository state |
| `engine` | Which AI model powers the agent (`copilot`, `claude`, `codex`) |
| `safe-outputs` | Compile-time allowlist of output operations and per-run maximums |
After editing the Markdown file, compile it:
“`bash
gh aw compile .github/agentic-workflows/pr-review.md
“`
This produces `.github/agentic-workflows/pr-review.lock.yml`. Commit both files — the `.md` is your source of truth, the `.lock.yml` is what Actions executes. Neither should be gitignored.
Building Your GitHub Agentic Code Review Workflow Step by Step
Here’s a complete, production-ready workflow you can drop into your repository today.
“`markdown
—
on:
pull_request:
types: [opened, synchronize, ready_for_review]
pull_request_review:
types: [dismissed]
permissions:
pull-requests: write
contents: read
tools:
- read_file
- list_files
- get_diff
- get_pr_metadata
engine: claude
safe-outputs:
- type: add-review
max: 1
- type: add-comment
max: 3
—
PR Code Review
You are a senior software engineer conducting a first-pass code review.
Review the pull request diff and any files referenced by the changes.
Your review must cover:
- Correctness — Does the logic match what the PR description claims?
Flag bugs, off-by-one errors, or unhandled edge cases.
- Security — Flag any changes to authentication, authorization,
input validation, or secret handling — even if they look intentional.
- Test coverage — If new logic was added but no test files were
modified, call this out explicitly with the affected file names.
- Breaking changes — Identify any public API, database schema,
or configuration changes that are not backward-compatible.
Output format
Post a single structured review using `add-review`. Use inline comments
for specific line-level issues. Use the review summary for high-level
observations.
Set the review state to:
- `APPROVE` — no blocking issues found
- `REQUEST_CHANGES` — one or more correctness or security issues found
- `COMMENT` — observations only, no blocking issues
Do not leave a review if the PR is marked as a Draft.
“`
Compile it, commit both files, and open a test PR. The agent runs automatically on every `pull_request` event that matches the trigger types.
Configuring Custom Quality Gates in the Instruction Body
Generic review prompts get you started. Custom quality gates are where the real value lives — and they’re the part no existing guide covers. Add these directly to your instruction body.
Security-sensitive file detection
“`
If any of the following paths appear in the diff, add a dedicated comment
regardless of whether other issues are found:
- Any file matching `/auth/`, `/middleware/auth`, or `/security/*`
- Any file matching `/.env` or `/secrets/*`
- Any file modifying database migration scripts
Label these comments: [SECURITY REVIEW REQUIRED]
“`
Missing test coverage gate
“`
Check whether the diff adds or modifies any non-test source files under
`src/` or `lib/`. If yes, verify that corresponding test files under
`tests/` or matching `*/.test.*` were also modified.
If source files changed but no test files changed, include a comment:
“⚠️ No test changes detected for modified source files: [list files].”
“`
Dependency bump alerts
“`
If `package.json`, `go.mod`, `requirements.txt`, `Gemfile`, or
`pyproject.toml` appear in the diff, extract changed dependency names
and versions. Flag any change that increments a major version number.
Include a comment: “📦 Major version bump detected: [dep] [old] → [new].
Verify compatibility with current usage.”
“`
These aren’t heuristics you need to implement in bash. They’re natural-language constraints the agent applies using its own judgment. Teams using AI review see double the code quality gains versus teams not using AI review — 36% vs 17% — according to Qodo’s research, even when delivery speed is held constant.
Choosing Your AI Engine: Copilot CLI vs Claude Code vs Codex for Code Review
Engine selection matters more than most setup guides acknowledge. Here’s an honest breakdown for the PR review use case.
GitHub Copilot CLI
The default and lowest-friction option. No additional authentication beyond your existing Copilot subscription. Fastest cold-start time. Works well for small-to-medium PRs where the relevant context fits within a tight cluster of files.
Limitation: context fidelity drops significantly on multi-file changes. In benchmarks comparing Copilot and Claude Code on multi-function bug-fixing tasks, Copilot’s context fidelity scored 5.9/10 versus Claude’s 8.5/10 (SitePoint, 2026).
Claude Code
The stronger choice for code review. Claude Code achieved a 48% code accept rate compared to Copilot’s 31% in algorithm implementation sessions. More importantly for review workflows, its ability to hold context across many files simultaneously means it’s less likely to miss an issue that only becomes visible when reading two or three files together. Swap the engine field to enable it:
“`yaml
engine: claude
“`
Claude Code is available at no additional subscription cost for all Copilot Business and Pro users.
OpenAI Codex
A solid alternative if your team is already embedded in the OpenAI ecosystem. Performance for pure code generation is competitive with Claude. For multi-file reasoning tasks like PR review, it falls between Copilot and Claude Code in most published benchmarks.
Practical recommendation: start with Copilot to validate your workflow structure, then switch to Claude Code if you’re reviewing PRs with more than five or six files changed, or if your reviews are missing cross-file issues.
Technical Preview Gotchas — and How to Work Around Them
GitHub Agentic Workflows are in technical preview. Rough edges exist. Here are the ones you’ll hit first.
The .lock.yml confusion
Every time you edit the `.md` file, you must re-run `gh aw compile`. The `.lock.yml` is not auto-regenerated. Forgetting this is the number one source of “my change isn’t doing anything” confusion. Fix it by adding a pre-commit hook that runs `gh aw compile` on any `.md` file under `.github/agentic-workflows/`.
PR interaction friction
During technical preview, the agent can post reviews and comments but cannot satisfy branch protection rules that require a minimum number of human approvals. If your rules require two human approvals, the agent’s `APPROVE` review won’t count toward that threshold. Design your workflows with this constraint in mind.
Cost opacity
The GitHub UI does not currently surface token usage. You’re consuming Copilot API capacity with no per-workflow cost breakdown available. Mitigate this by setting conservative `max` values in `safe-outputs` and scoping your `tools` list to only what the agent needs — this limits runaway token consumption on large diffs.
Debugging opaque agent decisions
When a developer asks “why did the agent flag that line?” there’s no built-in answer. Two practical workarounds:
- Structured rationale instructions: Tell the agent to preface each review comment with a one-sentence rationale (`”Flagging because: [reason]”`). This surfaces reasoning inside the comment itself.
- Actions run summary inspection: As of March 26, 2026, agentic workflow configs are visible in the Actions run summary. Navigate to the run, expand the agent job, and look for the tool call log — it shows which files were read and in what order, giving you a partial trace of the agent’s reasoning context.
Start Your First Agentic Review Today
GitHub agentic workflows code review changes what automated PR quality control can mean for your team. You’re not scripting detection logic for every edge case — you’re writing instructions once and letting the agent apply judgment at a granularity that previously required a human.
The fundamentals to carry forward: understand the safe-outputs model before you deploy anything to a production repo, commit both the `.md` and `.lock.yml` files, and start with a narrow tool list before expanding permissions. With 91% of development teams now using AI tools and AI-coauthored code appearing in 22% of merged PRs (DX Q4 2025 Impact Report), a working agentic review workflow is one of the highest-leverage improvements you can ship this month.
Clone the workflow file from this guide, compile it, open a test PR, and see what the agent catches. Then layer in the custom quality gates that match your team’s actual standards — security-sensitive path detection, test coverage gates, and dependency bump alerts are good starting points. The instruction body is yours to extend.