Mesa

Anthropic just launched Code Review. It is a multi-agent system that dispatches reviewers against your pull request on GitHub and costs $15–25 per review. It finds real bugs. At Anthropic internally, 54% of PRs now receive substantive comments versus 16% before.

Here is the part nobody is talking about: what happens after the review finds something?

The developer needs to parse what is noise and what needs to be fixed. The developer and coding agent fix the issues. The PR gets updated. The reviewer runs again to verify. Another $15–25. Even if the fixes are clean, you are paying for a second review cycle. And the fixes themselves are slower than they need to be; the agent is rebuilding context on code it wrote hours or days ago, in a session it no longer has access to. The cost is not $25. It is $25 per round of find-fix-verify.

We think there is a better model: review every turn, not every PR. Catch problems while the agent is still working, while the context is still warm, while the fix can be a one-line correction instead of a multi-file rewrite. And do it on the subscription you are already paying for.

The Context Gap

Here is the problem with reviewing code after the session is over.

The agent that wrote your code accumulated rich context over the course of a session — the plan, the reasoning, the tradeoffs, the architectural decisions, the dead ends it tried and abandoned. By the time a PR reviewer sees the diff, all of that is gone. The reviewer has to reconstruct intent from the output alone. The gap between what was known when the code was written and what is known when the code is reviewed is at its absolute maximum.

Traditional PR Review

develop end-to-end → review → fix with cold context → re-review

Review in session

develop → review/fix with context → no compounding drift

Dogfooding the Thing That Reviews You

A few weeks ago I was building a feature for Mesa itself. As you know, Claude asks clarifying questions before implementing anything and writes a planning document to a plans/ directory.

I was not thinking about code review. I was thinking about the feature. Saguaro, our local code review tool, was running in the background. The rules engine on the stop hook and the daemon doing asynchronous agent review. It is easy to forget they are there when nothing is wrong. That is by design.

Claude finished writing the plan and summarized what it had done. The stop hook fired. The daemon picked up the diff, which included the full planning document Claude had just written, along with Claude's summary of the work. A background claude -p process spun up, read the diff, read the plan, and then read our ARCHITECTURE.md and the surrounding codebase to understand what the plan was proposing.

A few minutes later on Claude's next turn, findings appeared in the loop. Not code bugs as the plan was a markdown file with requirements and sample snippets. Saguaro had caught that the proposed approach contradicted architectural boundaries we had already established. It flagged patterns in the plan that would have created layering violations if Claude had implemented them as written.

I had not asked for a design review. I had asked for a feature. But Saguaro does not know the difference between a source file and a planning document. It reviews whatever changed. And because it has read access to the full codebase, it checked the plan against the system we had already built.

We built a code reviewer. It became an architectural reviewer. It caught the problem before a single line of functional code existed. Under a PR-based model, that planning mistake would have survived through implementation. I would have found it in the review — or worse, in production — after Claude had spent thirty turns building on a flawed foundation. The fix would have been a rewrite. Instead it was a revision to a markdown file.

What We Are Building Toward

Saguaro runs two independent review systems. The rules engine is deterministic — you write rules as markdown files, they get injected into Claude's context before edits happen, and violations block the session immediately. The daemon is non-deterministic — it spawns a staff-engineer agent that reviews with a broader lens: bugs, regressions, security issues, cross-module contract violations. Rules block. The daemon suggests. The confidence level matches the delivery mechanism.

If you use Claude the daemon spawns claude -p. That is the same agent harness you are already running. Our goal was to hook into the tools and subscriptions developers already have and use. If you are running Claude Code, Codex, or Gemini, you already have a reviewer. You just are not using it yet.

The most expensive mistakes in agent-assisted development are not logic errors in a single function. They are coherence failures across files that compound silently over long sessions. A PR reviewer finds them after they have compounded. Saguaro was desined to find them turn by turn. In the case of that planning document, it found one before the code even existed.

The future of development with agents is not "write everything, then review." It is "review as you go." We think the economics demand it.

Saguaro is competely free and open source. If you want to try it:

brew install mesa-dot-dev/homebrew-tap/saguaro
sag init

source