SERA: Fine-Tune an Open-Source Coding Agent for $1,300

Every AI coding tool you’ve tried has the same blind spot: it has never seen your codebase. It doesn’t know your internal APIs, your team’s conventions, or the domain-specific patterns that took years to accumulate. SERA — the open-source coding agent from the Allen Institute for AI (Ai2) — was built to fix exactly that problem, and it does so at a price that finally makes custom coding agents realistic for teams outside of Big Tech.

What Is SERA?

SERA stands for Soft-Verified Efficient Repository Agents. It’s the first release in Ai2’s Open Coding Agents family — a 32-billion-parameter model that achieves 49.5% on SWE-bench Verified at 32K context and 54.2% at 64K context, matching frontier open models like Devstral-Small-2 (24B) and outpacing much larger models like GLM-4.5-Air (110B).

What makes those numbers remarkable isn’t the score alone — it’s what it cost to get there. The total spend for data generation and training SERA-32B was approximately $2,000 across 40 GPU-days. For comparison, the reinforcement learning approaches used by competing models cost 26× more to reach the same performance level.

Built largely by a single Ai2 researcher, Ethan Shen, SERA ships with everything open: model weights on Hugging Face, code on GitHub, 200,000 synthetic coding trajectories, and a CLI that integrates directly with Claude Code. The license is Apache 2.0.

The Problem With Existing Coding Agents

Most coding agents are trained on public repositories. That works fine when you’re working on open-source projects or standard frameworks. The moment you move to an internal codebase — a proprietary data pipeline, a domain-specific SDK, a years-old monolith — the agent becomes far less useful.

Ai2 identified three fundamental limitations that nearly every existing coding agent shares:

They’re closed. Training data, recipes, and fine-tuning access are locked away.
They’re expensive to train. Reproducing state-of-the-art performance typically requires reinforcement learning at significant GPU cost.
They’re ill-suited for private codebases. Generating synthetic training data from private repos requires test infrastructure that most teams don’t have.

SERA addresses all three. But the third point is where it truly stands apart.

The Core Innovation: Soft Verified Generation

Most synthetic data generation for coding agents relies on unit tests to verify correctness — if the code passes the tests, it goes in the training set. The problem: this requires existing test infrastructure, and it demands fully correct solutions, which dramatically limits the amount of usable data you can generate.

SERA uses a different approach called Soft Verified Generation (SVG). The process unfolds in four steps across two rollouts:

A teacher model makes a change to a codebase, starting from a randomly selected function.
That trajectory is converted into a synthetic pull request description.
The teacher then attempts to reproduce the change using only that PR description.
The two patches are compared using line-level recall — not test execution.

If the second rollout reproduces enough of the first patch (even partially), the trajectory is included in the training set. No tests required. No special infrastructure needed. Any repository qualifies.

“High-quality synthetic training data should mirror how a developer works on a problem rather than the precise details of correct code.”

— Ai2 Research Team

The counterintuitive insight: including partially correct solutions still produces models that generate fully correct code. In fact, vague instructions during data generation help diversify training data, producing more examples of refactoring, not only bug fixes.

This single design choice is what makes SERA adaptable to private codebases at low cost.

Benchmark Performance

SERA-32B holds its own against the best open-source coding agents available:

|—|—|—|—|

| SERA-32B | 49.5% ± 1.9% | 54.2% ± 1.4% | ✅ Fully open |

| Devstral Small 2 (24B) | 50.0% ± 1.3% | ~59.1% | ⚠️ Open-weight only |

| GLM-4.5-Air (110B) | 50.5% ± 1.3% | — | ⚠️ Open-weight only |

A few important caveats put those numbers in context. SERA-32B was not trained past 32K tokens and used no reinforcement learning — two factors that put it at a structural disadvantage at longer context windows. Devstral Small 2’s edge at 64K comes from both RL training and extended context training, not from a fundamentally better approach.

For teams that care about full reproducibility — being able to inspect the training data, audit the recipe, and fine-tune their own variants — SERA’s fully open stack is a meaningful advantage that benchmark numbers don’t capture.

The family also includes SERA-14B and SERA-8B (29.4% on SWE-bench), so teams with tighter GPU budgets can still participate.

Specializing SERA to Your Private Codebase

This is where the real value proposition comes together. Here’s what Ai2’s research shows when you fine-tune SERA on a specific repository:

Generating 8,000 synthetic trajectories for a private repo costs approximately $1,300.
Models trained on those trajectories consistently match or exceed 100B+ parameter teacher models on that codebase.
On Django, a specialized SERA model achieves 52.23% on SWE-bench vs. GLM-4.5-Air’s 51.20%.
On SymPy, the specialized model reaches 51.11% vs. GLM-4.5-Air’s 48.89%.

A 32B model, trained for $1,300, outperforming a 110B frontier model — on your specific codebase. That’s the SERA pitch in concrete numbers.

And because SVG doesn’t need test infrastructure, you can apply this to codebases that have no unit tests at all, or that test in ways that are difficult to automate. The only requirement is that you have a repository.

Getting Started

Ai2 designed SERA to be accessible even without deep ML expertise.

The fastest path: Modal + SERA CLI

The `sera-cli` package integrates SERA directly with Claude Code. The fastest way to run it is via Modal, which handles GPU provisioning and vLLM deployment automatically:

“`bash

pip install sera-cli

sera modal run

“`

The first run takes roughly 10 minutes to download the ~65GB of model weights. After that, the inference server is live and Claude Code can use SERA as its backend model.

Self-hosting with vLLM

If you prefer to run inference on your own infrastructure, SERA works with vLLM. Running SERA-32B in BF16 on four H100 GPUs achieves roughly 1,950 output tokens per second at a 16K context window — fast enough for interactive development workflows.

Training your own specialized model

The full training recipe is on [GitHub](https://github.com/allenai/SERA), and 200,000 pre-generated trajectories are available as a starting point. If you want to specialize to your own repository, the process is:

Generate synthetic trajectories from your codebase using SVG.
Fine-tune a base model (SERA-32B or smaller) on those trajectories.
Deploy with vLLM or Modal.

Ai2’s cost estimates assume self-hosted inference via vLLM. If you use an API provider like z.ai instead, the cost advantages compound: 53× cheaper than SkyRL, 115× cheaper than SWE-smith at equivalent performance.

Why SERA Changes the Open-Source Coding Agent Landscape

Well-funded labs with the resources to run reinforcement learning at scale have dominated the open-source coding agent field. SERA demonstrates that pure supervised fine-tuning on synthetically generated data can close most of that gap — and that the total compute budget required is within reach of individual researchers and small engineering teams.

Ai2’s bet is that SERA removes the barrier to entry. Instead of choosing between a closed proprietary agent and an open-source model that doesn’t know your code, teams can now fine-tune their own specialized agent on a budget that rivals a few months of a cloud-hosted coding assistant subscription.

The broader implication: coding agent performance on your codebase is no longer a function of how big a lab is. It’s a function of how much synthetic training data you generate from your own repositories.

Start Experimenting With SERA

SERA is an open-source coding agent that’s reproducible and designed to run on hardware accessible to most engineering teams. If you’re evaluating coding agents for your workflow — especially if you have a private codebase with domain-specific patterns — SERA is worth testing before committing to a closed alternative.

Clone the [SERA repository on GitHub](https://github.com/allenai/SERA), install the CLI, and run your first inference session with Modal. Then read the [SERA paper](https://arxiv.org/html/2601.20789v1) to understand how SVG works — the ideas are simple enough that you can adapt them to your own data generation needs.

The cost of a specialized coding agent has dropped to $1,300. The only question is what you’ll build with it.

What Is SERA?

The Problem With Existing Coding Agents

The Core Innovation: Soft Verified Generation

Benchmark Performance

Specializing SERA to Your Private Codebase

Getting Started

The fastest path: Modal + SERA CLI

Self-hosting with vLLM

Training your own specialized model

Why SERA Changes the Open-Source Coding Agent Landscape

Start Experimenting With SERA

Leave a Reply Cancel reply

Related Posts

How to Build AI-Generated Code Quality Gates in CI/CD

Local LLM vs. Cloud API: The Real Cost Calculator for 2026

AI Agent Security Checklist: OWASP 2026

Multi-Agent AI Coding Workflow: The 3-Tier Guide