AI-Generated Code Security Review: What SAST Misses

Your static analysis pipeline passes 95% of AI-generated code on syntax — and misses the vulnerabilities that will actually hurt you. AI-generated code security review isn’t a matter of running more scanners. It’s a matter of understanding what scanners are structurally incapable of seeing, then building a human-in-the-loop workflow around those blind spots.

The March 2026 DryRun Security report confirmed what many teams already suspected: 87% of AI agent pull requests introduce at least one security vulnerability. The damage isn’t random noise. It’s a repeatable, predictable set of logic flaws — and this post gives you a five-phase workflow to catch every category of them before they ship.

Why Your SAST Scanner Is Blind to AI’s Most Dangerous Bugs

Static analysis tools are excellent at what they were designed to do: match known patterns. Hardcoded credentials, SQL string concatenation, unescaped output — these show up as recognizable fingerprints in source code, and SAST catches them reliably.

AI agents don’t primarily introduce fingerprint bugs. They introduce logic bugs.

A broken access control flaw doesn’t look broken in isolation. The endpoint exists, the middleware exists, the authentication function exists. What’s missing is the connection between them — and no pattern-matching engine can detect an absent relationship. Detecting it requires understanding the intent of the system, tracing data flow across files, and verifying that protection actually applies to each exposed surface.

Pattern-based SAST tools are architecturally incapable of catching logic flaws — they detect what’s present, not what’s absent.

This distinction has become critical. Veracode’s Spring 2026 data shows AI code syntax pass rates have climbed from ~50% to ~95% since 2023 — but security pass rates have remained flat between 45% and 55% regardless of model generation or parameter count. Newer models write cleaner, more compilable code. They do not write more secure code.

What the March 2026 DryRun Security Report Reveals: The 4 Bugs Every AI Agent Ships

The DryRun Security Agentic Coding Security Report (March 2026) is the most rigorous study of AI agent security behavior available. Across 38 scans covering 30 pull requests from Claude Code, OpenAI Codex, and Google Gemini, researchers found 143 security issues. 87% of AI agent PRs — 26 of 30 — shipped with at least one vulnerability.

What makes this data actionable isn’t the volume. It’s the consistency.

Four vulnerability classes appeared in every final codebase, regardless of which agent generated the code:

  1. Broken access control — authorization checks defined elsewhere but never enforced at the point of access.
  2. Unauthenticated endpoints on sensitive operations — routes that handle privileged data or mutations with no authentication gate.
  3. Weak JWT secret management — hardcoded or default secrets, secrets sourced from environment variables without validation, or fallback logic that degrades to an insecure default when a secret is missing.
  4. Rate-limiting middleware defined but never connected — the middleware exists in the codebase, often in its own file, but is never registered in the request pipeline.

That fourth item is where SAST fails visibly. The scanner sees the rate-limiting code, checks it against known patterns, and passes the file. What it can’t see is the missing `app.use(rateLimiter)` call in the application bootstrap — a logic gap, not a code defect.

Now that you understand what you’re up against, here’s how to close the gap systematically.

Phase 1 — Stop Bugs Before Code Is Written: The Security Prompt Injection Step

Many of the DryRun report’s vulnerabilities originated in decisions made before a single line of code was written. When an agent is asked to “add a WebSocket chat feature,” it doesn’t infer that WebSocket authentication should mirror the REST authentication strategy already in the codebase. That context has to be supplied explicitly.

The fix is simple and it works. Auth0 and Veracode research found that adding a generic security reminder to an AI coding prompt improved the rate of secure and correct code from 56% to 66% — a meaningful lift for a small prompt addition.

A pre-coding security prompt template

Before generating code for any new feature, prepend a security context block to your agent prompt:

“`

Before writing any code for this feature, confirm:

  • Authentication: Which endpoints require auth? How is it enforced?
  • Authorization: Who can access this resource? Is ownership validated per object?
  • Non-REST surfaces: If WebSocket, gRPC, or event handlers are involved,

how is auth applied to those surfaces specifically?

  • Rate limiting: Is it needed? Where in the pipeline will it be registered?
  • Dependencies: List all third-party packages by exact name and version

before using them.

“`

This isn’t asking the agent to be secure. It’s forcing the agent to surface its assumptions — at a point where you can challenge them before they calcify into code.

Phase 2 — Per-PR Automated Scanning: What to Run, What It Catches, and What It Misses

Automated scanning is non-negotiable — but running it only at final build misses how risk accumulates. The DryRun report shows vulnerabilities compound PR-by-PR. Catching them requires per-PR scanning in CI, not a gate at the release stage.

Run two layers on every AI-generated PR:

Layer 1: Pattern-based SAST (Semgrep / CodeQL)

These tools catch what they’re designed for:

  • Injection vulnerabilities (SQL, command, and XSS)
  • Known insecure function calls and APIs
  • Hardcoded secrets matching regex patterns
  • Dependency versions with published CVEs

What they miss: Authorization logic, middleware connection gaps, WebSocket auth absence, and any flaw requiring cross-file relationship understanding.

Layer 2: Context-aware analysis (DryRun DeepScan / Snyk DeepCode)

These tools perform cross-file data flow analysis — they can trace whether an authentication function defined in one file is actually invoked before a handler in another. They close some of the logic-layer gaps that pattern tools cannot.

What they still miss: Novel design-level flaws baked in before coding started, and intent verification. No automated tool fully closes the logic gap.

Veracode reports that 86% of AI-generated code samples failed to defend against cross-site scripting (CWE-80) and 88% were vulnerable to log injection (CWE-117). These are well-mapped vulnerabilities that Layer 1 tooling handles well. If your scanner isn’t catching them, fix your configuration before worrying about the harder problems.

Phase 3 — The Human Logic Review Checklist (Focused on AI’s Known Blind Spots)

Manual review at scale requires focus. You cannot audit every AI-generated PR like a hand-crafted architecture change — but you can narrow your attention to the specific failure patterns AI agents reliably introduce.

For each AI-generated PR, the reviewer checks:

Access Control

  • [ ] Every write, delete, or privileged read endpoint has an explicit auth check
  • [ ] Object-level authorization is validated (users can access only their resources, not any resource with a valid ID)
  • [ ] Auth middleware is connected to this route — not just defined in a file somewhere

JWT and Secrets

  • [ ] No hardcoded fallback secrets (`secret = process.env.JWT_SECRET || “dev-secret”`)
  • [ ] Token validation fails closed — absent or malformed tokens return 401, not a degraded-permission response
  • [ ] No new credentials appear inline in code (GitHub reported 39 million leaked secrets in public repositories in 2024)

Middleware Connectivity

  • [ ] Rate limiting middleware is registered in the request pipeline, not just defined
  • [ ] CORS configuration is explicit, not wildcard, on authenticated routes
  • [ ] Error handlers don’t leak stack traces in production mode

Disconnected Logic

  • [ ] Any helper or utility function the agent created is actually called somewhere — unused security helpers are a recurring AI pattern

The remediation cost argument alone makes this phase indispensable. Per OWASP’s 2025 data, broken access control affects 3.73% of applications and carries a median remediation time of 315 days. Catching it during PR review costs minutes. Catching it post-production costs nearly a year.

Phase 4 — The Non-REST Surface Audit: WebSocket, gRPC, and Event Handler Auth Checks

This phase exists because of one specific, reproducible DryRun finding: all three AI agents correctly authenticated REST endpoints but left WebSocket endpoints entirely unauthenticated.

This isn’t coincidence. AI agents are trained on code where REST authentication is extensively documented and WebSocket authentication patterns are not. The agent applies its learned security patterns to the surfaces it has the most training examples for — and skips the rest.

The result is a flaw class invisible to SAST (there’s no “unauthenticated WebSocket” pattern to match) and invisible to most human reviewers who focus on HTTP routes.

The non-REST auth checklist

For any PR that introduces or modifies WebSocket connections, gRPC services, server-sent events, message queue consumers, or event-driven handlers:

  • [ ] Authentication is validated on connection establishment — not assumed from the HTTP upgrade handshake
  • [ ] Authorization is checked per-message or per-operation, not just at connection time
  • [ ] The same auth token / session mechanism used for REST is explicitly applied — agents will not do this automatically
  • [ ] Disconnection on invalid or expired token is implemented, not just rejection of the current message

If the PR doesn’t touch these surfaces, skip this phase. If it does, treat it as mandatory regardless of what automated scanning reported.

Phase 5 — Dependency Hallucination Check: Guarding Against Slopsquatting

Slopsquatting is a supply-chain attack vector specific to AI-generated code. Agents occasionally hallucinate package names — generating plausible-sounding library names that don’t exist on npm, PyPI, or other registries. Attackers monitor for these hallucinated names, publish malicious packages under them, and wait.

No existing SAST tool checks for this by default.

The dependency verification step

Before merging any AI-generated PR that introduces new dependencies:

  1. Extract all new package names from `package.json`, `requirements.txt`, `go.mod`, or equivalent.
  2. Verify each package exists on the official registry with a meaningful download history and an identifiable maintainer.
  3. Check the exact name — one-character typos and underscore/hyphen variations are the most common attack surface.
  4. Confirm the import path matches — in Go and Rust especially, the module path in code must exactly match the registry entry.

Tools like `pip-audit`, `npm audit`, and `socket.dev` help automate this for known-bad packages. Novel hallucinations won’t have CVEs yet. Manual spot-checks for unfamiliar package names remain necessary.

Building a Team Trust Log: Turning Per-Agent Failure Patterns Into Calibrated Review Rules

Generic skepticism about all AI output doesn’t scale. What scales is calibration: knowing which agent, on which task type, in which language, produces which kinds of issues.

CodeRabbit’s 2025/2026 report found that pull requests per author increased 20% year-over-year due to AI adoption — while incidents per pull request increased 23.5% over the same period. Your team is shipping faster and shipping more problems per shipment. Treating every agent equally wastes the attention you need to spend where risk is highest.

The trust log is a lightweight practice. Keep a shared document where reviewers record:

| PR | Agent Used | Issue Found | Phase Caught | SAST Missed? |

|—|—|—|—|—|

| #412 | Claude Code | Unauthenticated WebSocket | Phase 4 | Yes |

| #438 | Copilot | Missing rate-limit registration | Phase 3 | Yes |

| #451 | Codex | Hallucinated npm package | Phase 5 | Yes |

After 20–30 entries, patterns emerge. You might find that one agent reliably handles REST auth but consistently misses WebSocket surfaces. Another might introduce more dependency risk in Python than TypeScript. These are team-specific, agent-specific calibration signals no public report can give you — only your own production history can.

The goal isn’t to trust AI agents less. It’s to trust them accurately — knowing exactly where they’re reliable and where they need your eyes.

Over time, your trust log becomes the basis for weighted review rules: spend more human attention on Phase 4 for WebSocket-heavy PRs from agent X, automate more aggressively for Phase 2 items that agent Y consistently handles correctly.

AI-Generated Code Security Review: Build the Workflow Now

AI-generated code security review isn’t about slowing down the development velocity that makes AI agents valuable — it’s about ensuring that velocity doesn’t compound into a 315-day remediation backlog. The DryRun finding that 87% of AI agent PRs introduce at least one vulnerability isn’t a reason to stop using AI agents. It’s a reason to build a review system calibrated to how they actually fail.

At 35 new AI-attributed CVEs disclosed in March 2026 alone — up from 6 in January — the window for “we’ll figure out security later” has closed.

Start with one sprint. Add the pre-coding security prompt this week. Instrument per-PR scanning in CI.

Add the WebSocket audit checklist as a required review step for any PR touching real-time or async surfaces. Open a shared document and log the first issue your team finds.

You don’t need all five phases in place tomorrow. You need them in place before the next unauthenticated endpoint reaches production.

Leave a Reply

Your email address will not be published. Required fields are marked *