How to Review Code Written by AI Agents
Reviewing agent-written code is not just line-by-line inspection. Start with intent, verify context coverage, rank risky files, and require evidence for the behavior that changed.
AI coding agents can produce large, coherent patches quickly. That speed changes the review problem. The question is no longer only "is this line correct?" It is also "did the agent understand the task, inspect the right evidence, and test the behavior it changed?"
1. Start from intent
Read the original prompt or task summary before the diff. A correct-looking patch can still be wrong if it solves a broader or narrower problem than requested. The review should ask whether the implementation matches the stated intent and whether any added behavior is justified.
2. Check context coverage
Before reviewing details, ask what the agent observed. Did it read the interface it changed? Did it inspect tests? Did it look at the caller? Blind edits are one of the highest-signal risks in AI code review because agents can produce plausible changes without the repository evidence a human would naturally gather.
3. Rank the files by risk
Do not review every generated line equally. Prioritize authentication, authorization, billing, data migrations, concurrency, persistence, security boundaries, generated configuration, and public APIs. A small change in those areas can carry more risk than a large generated helper or test fixture.
4. Review uncertainty, not just confidence
Agents often expose uncertainty in their planning or trace: "assuming", "likely", "not sure", "untested", "may need". That text is review signal. It should guide the human toward areas where the agent knew it was working from incomplete evidence.
5. Verify tests against behavior
A green test run is useful but not enough. Check whether tests exercise the new behavior, the failure mode, and the edge case the prompt actually asked about. If the agent only ran formatting or type checks, say so in the review packet.
6. Look for scope creep
AI agents often clean nearby code, rename helpers, adjust formatting, or "improve" unrelated branches while solving a task. Some of that is useful. In review, unrelated edits increase the cognitive load and make rollback harder. Ask whether every touched file supports the original intent.
7. Require provenance for merges
For AI-assisted work, the minimum merge packet should include the prompt, agent identity, changed commits, test evidence, and any relevant decision notes. h5i captures those with h5i capture commit and makes them visible through h5i recall log, h5i recall context, and PR review bodies.
A practical checklist
- Does the diff match the prompt?
- Were the edited files observed before editing?
- Did the agent touch security, data, or API boundaries?
- Are uncertainty notes linked to review focus?
- Were meaningful tests run?
- Can this change be rolled back by intent if it fails?
- Is the provenance stored somewhere durable?
FAQ
Should AI-generated code need stricter review?
It needs different review. The code may be high quality, but the reviewer needs extra evidence about intent, context, and tests because the agent's working state is otherwise invisible.
What is the biggest mistake?
Reviewing only the final diff and ignoring whether the agent had enough repository context to make the change.
How does h5i help?
h5i turns prompts, traces, risk signals, test evidence, and provenance into Git-native records that reviewers can inspect before merging.
Sources and verification
This article avoids vendor-specific claims that were not checked against primary docs or local h5i CLI behavior.
Bring AI provenance into Git
h5i records prompts, context, test evidence, review signals, and agent messages alongside normal Git history.
Star on GitHub Read the guides