Guide · AI code review

Why Git Diffs Are Not Enough for AI-Generated Code

A diff tells you what changed. It does not tell you what the agent was asked, which files it ignored, why it chose the approach, or whether the risky part was tested.

A Git diff is still the first thing to review. It is precise, compact, and universal. For AI-generated code, it is also incomplete. The diff shows the final text change after an agent has already compressed a long sequence of prompts, observations, tool calls, assumptions, and test attempts into a patch.

That compression hides the parts reviewers most need. A human-written diff carries social context: the author understands the system, attends standup, and can answer why the change exists. An agent-written diff may have been produced from a narrow prompt, stale context, or an incorrect assumption about code it never opened.

What a diff cannot tell you

A diff cannot tell whether the agent read the policy module before editing authorization code. It cannot tell whether the original prompt asked for a quick prototype or production hardening. It cannot tell whether the agent saw a warning from the test suite and worked around it. It cannot tell whether a second agent reviewed the plan and flagged a risk.

The missing data falls into five buckets: intent, evidence, uncertainty, execution, and collaboration.

The failure mode: reviewing every line equally

When reviewers only have a diff, they often review evenly. That is expensive and ineffective. AI-generated code has uneven risk. A one-line config change made after reading the right file may be lower risk than a small auth refactor made without observing the caller. A large generated test file may be lower risk than a three-line change to token expiry.

Good AI review starts by ranking attention. h5i's audit model is built around this idea: blind edits, uncertainty signals, churn, scope, and prompt-injection indicators should push a change higher in the review queue.

The minimum review packet

For each AI-assisted change, reviewers should see:

That packet turns review from archaeology into verification. The reviewer can ask whether the implementation matches the intent, whether the agent had enough context, and whether tests cover the changed behavior.

How h5i adds the missing layer

h5i keeps the Git diff intact and adds structured context around it. h5i capture commit records prompt, model, agent, tests, tokens, and decisions with the commit. h5i recall context shows the goal, milestones, and OBSERVE/THINK/ACT trace. h5i recall blame --show-prompt connects lines back to the AI prompt at commit boundaries. h5i share pr can turn that context into a reviewer-facing pull-request body.

The point is not to make reviews longer. It is to make them narrower. Review the diff, but use provenance and context to decide where the human attention should go first.

FAQ

Should reviewers still read the diff?

Yes. The diff remains the ground truth for code changes. AI-aware review adds the context needed to interpret the diff correctly.

Can commit messages solve this?

Commit messages help, but they are summaries. They rarely preserve prompts, tool evidence, uncertainty, test output, or multi-agent handoffs.

What is the simplest improvement?

Capture the prompt and test result for every AI-assisted commit. That alone makes later review and rollback much more practical.

Sources and verification

This article avoids vendor-specific claims that were not checked against primary docs or local h5i CLI behavior.

Bring AI provenance into Git

h5i records prompts, context, test evidence, review signals, and agent messages alongside normal Git history.

Star on GitHub Read the guides