Auditing AI-Generated Code: A Practical Framework
Your team merges 50 PRs a week. Thirty are AI-assisted. You don't have the bandwidth to review them all carefully — so which ones do you scrutinize? Here's a four-vector framework and the deterministic signals that produce a ranked review queue.
"Treat AI-generated code like junior-engineer code" is a popular maxim and a useless one. A junior engineer's output is uniform — uniformly cautious, uniformly inexperienced. AI output isn't. The same model produces flawless boilerplate and dangerously confident hallucinations in the same PR. Treating both halves with the same review intensity wastes the half of your reviewer attention that's worth most.
What you actually want is asymmetry. Spend ten seconds on the trivial half. Spend twenty minutes on the dangerous half. The trick is knowing which is which before you start reading.
That requires data the diff doesn't carry: what did the model read before editing? Where did it hedge? Did it stay inside scope? Did anything in its reasoning trace look like an injection? Git records none of that. h5i records all of it, deterministically, with no model in the audit path.
Four risk vectors that matter
Across roughly a year of post-mortems on AI-introduced regressions, four recurring failure modes emerged. Each has a deterministic signal — something you can extract from session logs without invoking another model.
| Vector | What it looks like | Detector |
|---|---|---|
| Blind edits | File modified with no preceding Read | Tool-call sequence analysis |
| Uncertainty | Hedge phrases inside thinking blocks | Calibrated phrase lexicon |
| Scope creep | Edits to files unrelated to the prompt | Diff-vs-prompt overlap |
| Prompt injection | Override / exfiltration patterns in trace | Regex over OBSERVE/THINK/ACT |
Each vector is an independent input to a composite risk score. The score is the only thing your reviewers actually need to look at — it sorts PRs into "skim" and "really read."
Vector 1 — Blind edits
A blind edit is a Write or Edit call to a file with no preceding Read of the same file in the session. It's the single highest-precision indicator of a model writing from training-data memory rather than from the file's current state. In practice, blind edits are the leading cause of "the AI deleted my comment / regressed my fix / used the old API."
h5i extracts the tool-call sequence from the session log and surfaces blind edits as a single number plus a list:
$ h5i notes coverage --max-ratio 0.5 ── Attention Coverage ───────────────────────────────────── files edited: 7 files edited blindly: 2 ⚠ src/billing/token.rs 2 edits · 0 reads · 100% blind ▲ src/api/checkout.rs 3 edits · 1 read · 67% blind ✔ src/auth.rs 4 edits · 4 reads · 0% blind
The interpretation: billing/token.rs was modified twice with the model never having looked
at the file in this session. Whatever it wrote, it wrote from memory. That edit deserves human
eyes regardless of how clean the diff looks.
Vector 2 — Uncertainty
Models hedge in their thinking blocks. The hedges don't appear in the chat output — by the time the model addresses you, it's converged on a confident-sounding answer — but they're recorded in the session log.
h5i scans every thinking block for a calibrated vocabulary of self-doubt phrases, each mapped to a confidence score (e.g. "not sure" → 25%, "might break" → 30%, "assuming" → 45%). Files where uncertainty signals concentrate are exactly the files the model itself flagged as risky.
We covered this detector in detail in
Vibe Coding With Claude Is Fun — Until It Silently Ships a Risk.
The short version is: h5i notes uncertainty turns a session's hidden hedges into a heatmap
that tells you where to start reading.
Vector 3 — Scope creep
The user asked Claude to fix a bug in parser.rs. The PR touches eleven files. Maybe that's
fine — the parser is a hub. Maybe it's not — five of those files are unrelated and the model
"noticed" minor issues while it was there.
h5i records the user prompt that opened each session, then compares it (lexically and via
file co-mention) against the diff. When a commit touches files the prompt doesn't mention and
that aren't in the same module, it lands in h5i notes review as a scope-creep flag.
You can also enforce scope at policy level — h5i policy rules can require that AI commits
restrict edits to files mentioned in the prompt or its session's reading set.
Vector 4 — Prompt injection
An agent that reads a poisoned README, scrapes a malicious doc, or follows a hostile link can have an injected instruction sitting in its reasoning trace right now. The output looks normal. The trace doesn't.
h5i context scan runs eight deterministic regex rules over the OBSERVE/THINK/ACT trace and
reports a 0.0–1.0 risk score with line-level hits. We go deep on the detector design in
Detecting Prompt Injection in Agent Reasoning Traces.
For the framework here, all that matters is: it's another deterministic input to the composite
score, with no model in the path.
The composite score
Take all four vectors per commit, weight them, sum, normalize. h5i notes review spits out a
ranked list:
$ h5i notes review --limit 10 Suggested Review Points — 4 commits flagged (scanned 50, min_score=0.40) ────────────────────────────────────────────────────────────────── #1 a3f8c12 score 0.81 ████████░░ Alice · 2026-05-06 14:02 UTC refactor billing token refresh ⚠ blind edit · high uncertainty · 4 files touched #2 9e21b04 score 0.62 ██████░░░░ Bob · 2026-05-05 11:45 UTC add retry to http client ▲ moderate uncertainty · scope creep (3 unrelated files) #3 c1a2b3d score 0.47 ████░░░░░░ moderate complexity
Reviewer workflow: open the top five, ignore the rest. The bottom-half PRs aren't unreviewed — they pass through your standard reviewer-allocation, but with one fewer pair of eyes than the flagged ones.
min_score=0.40 as the floor. Empirically, anything below
that produces a 0.2% true-positive rate, while the 0.40+ band sits around 35-50% true positives.
The exact number depends on your team's prompt discipline.
Wiring it into your CI
The simplest integration is a GitHub Actions step that runs h5i notes review --base origin/main
on every PR and posts the top-flagged commits as a comment:
# Runs on every PR; comments the ranked review queue. - name: h5i audit run: | h5i pull h5i notes review --base origin/main --limit 5 --format md \ > review.md - uses: actions/github-script@v7 with: script: | const fs = require('fs'); const body = fs.readFileSync('review.md', 'utf8'); github.rest.issues.createComment({ ...context.repo, issue_number: context.issue.number, body, });
A stricter integration uses h5i policy check as a required status check. Define a TOML
policy that rejects commits with required_audit=false on protected branches, blind edits
above a threshold, or any prompt-injection signals at HIGH severity. The check fails the PR; CI
blocks merge.
What this isn't
A few things the framework does not do, in case the marketing is unclear:
- It does not approve PRs. It ranks them. Humans still review.
- It does not catch bugs the model never noticed. Uncertainty signals are a recall floor, not a ceiling.
- It does not score human-written commits. It scores AI-tagged commits with associated session logs.
- It does not call another model. Every signal is a regex, a tool-call sequence comparison, or a diff-vs-prompt analysis. The audit path is fully deterministic — by design, so the audit itself can be audited.
Try it on your last week of merges
The retroactive form takes about three minutes per repo:
$ h5i init $ h5i notes analyze --since 7.days.ago scanned 38 commits · 24 had session logs · linked 24/24 $ h5i notes review --since 7.days.ago # See the top-flagged commits from the last week. # Open them. See if the framework agrees with the bugs you # actually shipped.
The first time you do this, you'll find one or two flagged commits you remember as "weird, something felt off in review but I couldn't pin it." That's the framework working.
Make AI code review asymmetric in the right direction
h5i is open source, Apache 2.0, and runs entirely locally — no model in the audit path.
Star on GitHub Back to docs