Practice · 2026-05-06

Auditing AI-Generated Code: A Practical Framework

Your team merges 50 PRs a week. Thirty are AI-assisted. You don't have the bandwidth to review them all carefully — so which ones do you scrutinize? Here's a four-vector framework and the deterministic signals that produce a ranked review queue.

By Koukyosyumei Reading time 9 min Tags Code Review · Compliance · AI Governance

"Treat AI-generated code like junior-engineer code" is a popular maxim and a useless one. A junior engineer's output is uniform — uniformly cautious, uniformly inexperienced. AI output isn't. The same model produces flawless boilerplate and dangerously confident hallucinations in the same PR. Treating both halves with the same review intensity wastes the half of your reviewer attention that's worth most.

What you actually want is asymmetry. Spend ten seconds on the trivial half. Spend twenty minutes on the dangerous half. The trick is knowing which is which before you start reading.

That requires data the diff doesn't carry: what did the model read before editing? Where did it hedge? Did it stay inside scope? Did anything in its reasoning trace look like an injection? Git records none of that. h5i records all of it, deterministically, with no model in the audit path.

Four risk vectors that matter

Across roughly a year of post-mortems on AI-introduced regressions, four recurring failure modes emerged. Each has a deterministic signal — something you can extract from session logs without invoking another model.

Vector	What it looks like	Detector
Blind edits	File modified with no preceding Read	Tool-call sequence analysis
Uncertainty	Hedge phrases inside thinking blocks	Calibrated phrase lexicon
Scope creep	Edits to files unrelated to the prompt	Diff-vs-prompt overlap
Prompt injection	Override / exfiltration patterns in trace	Regex over OBSERVE/THINK/ACT

Each vector is an independent input to a composite risk score. The score is the only thing your reviewers actually need to look at — it sorts PRs into "skim" and "really read."

Vector 1 — Blind edits

A blind edit is a Write or Edit call to a file with no preceding Read of the same file in the session. It's the single highest-precision indicator of a model writing from training-data memory rather than from the file's current state. In practice, blind edits are the leading cause of "the AI deleted my comment / regressed my fix / used the old API."

h5i extracts the tool-call sequence from the session log and surfaces blind edits as a single number plus a list:

~/my-project

$ h5i notes coverage --max-ratio 0.5

── Attention Coverage ─────────────────────────────────────
  files edited:           7
  files edited blindly:   2

  ⚠ src/billing/token.rs    2 edits  ·  0 reads  ·  100% blind
  ▲ src/api/checkout.rs     3 edits  ·  1 read   ·  67%  blind
  ✔ src/auth.rs             4 edits  ·  4 reads  ·  0%   blind

The interpretation: billing/token.rs was modified twice with the model never having looked at the file in this session. Whatever it wrote, it wrote from memory. That edit deserves human eyes regardless of how clean the diff looks.

Vector 2 — Uncertainty

Models hedge in their thinking blocks. The hedges don't appear in the chat output — by the time the model addresses you, it's converged on a confident-sounding answer — but they're recorded in the session log.

h5i scans every thinking block for a calibrated vocabulary of self-doubt phrases, each mapped to a confidence score (e.g. "not sure" → 25%, "might break" → 30%, "assuming" → 45%). Files where uncertainty signals concentrate are exactly the files the model itself flagged as risky.

We covered this detector in detail in Vibe Coding With Claude Is Fun — Until It Silently Ships a Risk. The short version is: h5i notes uncertainty turns a session's hidden hedges into a heatmap that tells you where to start reading.

Vector 3 — Scope creep

The user asked Claude to fix a bug in parser.rs. The PR touches eleven files. Maybe that's fine — the parser is a hub. Maybe it's not — five of those files are unrelated and the model "noticed" minor issues while it was there.

h5i records the user prompt that opened each session, then compares it (lexically and via file co-mention) against the diff. When a commit touches files the prompt doesn't mention and that aren't in the same module, it lands in h5i notes review as a scope-creep flag.

You can also enforce scope at policy level — h5i policy rules can require that AI commits restrict edits to files mentioned in the prompt or its session's reading set.

Vector 4 — Prompt injection

An agent that reads a poisoned README, scrapes a malicious doc, or follows a hostile link can have an injected instruction sitting in its reasoning trace right now. The output looks normal. The trace doesn't.

h5i context scan runs eight deterministic regex rules over the OBSERVE/THINK/ACT trace and reports a 0.0–1.0 risk score with line-level hits. We go deep on the detector design in Detecting Prompt Injection in Agent Reasoning Traces. For the framework here, all that matters is: it's another deterministic input to the composite score, with no model in the path.

The composite score

Take all four vectors per commit, weight them, sum, normalize. h5i notes review spits out a ranked list:

~/my-project

$ h5i notes review --limit 10

Suggested Review Points — 4 commits flagged (scanned 50, min_score=0.40)
──────────────────────────────────────────────────────────────────
  #1  a3f8c12  score 0.81  ████████░░
       Alice · 2026-05-06 14:02 UTC
       refactor billing token refresh
       ⚠ blind edit · high uncertainty · 4 files touched

  #2  9e21b04  score 0.62  ██████░░░░
       Bob · 2026-05-05 11:45 UTC
       add retry to http client
       ▲ moderate uncertainty · scope creep (3 unrelated files)

  #3  c1a2b3d  score 0.47  ████░░░░░░
       moderate complexity

Reviewer workflow: open the top five, ignore the rest. The bottom-half PRs aren't unreviewed — they pass through your standard reviewer-allocation, but with one fewer pair of eyes than the flagged ones.

Practical tip. Set min_score=0.40 as the floor. Empirically, anything below that produces a 0.2% true-positive rate, while the 0.40+ band sits around 35-50% true positives. The exact number depends on your team's prompt discipline.

Wiring it into your CI

The simplest integration is a GitHub Actions step that runs h5i notes review --base origin/main on every PR and posts the top-flagged commits as a comment:

.github/workflows/ai-audit.yml

# Runs on every PR; comments the ranked review queue.
- name: h5i audit
  run: |
    h5i pull
    h5i notes review --base origin/main --limit 5 --format md \
      > review.md
- uses: actions/github-script@v7
  with:
    script: |
      const fs = require('fs');
      const body = fs.readFileSync('review.md', 'utf8');
      github.rest.issues.createComment({
        ...context.repo,
        issue_number: context.issue.number,
        body,
      });

A stricter integration uses h5i policy check as a required status check. Define a TOML policy that rejects commits with required_audit=false on protected branches, blind edits above a threshold, or any prompt-injection signals at HIGH severity. The check fails the PR; CI blocks merge.

What this isn't

A few things the framework does not do, in case the marketing is unclear:

It does not approve PRs. It ranks them. Humans still review.
It does not catch bugs the model never noticed. Uncertainty signals are a recall floor, not a ceiling.
It does not score human-written commits. It scores AI-tagged commits with associated session logs.
It does not call another model. Every signal is a regex, a tool-call sequence comparison, or a diff-vs-prompt analysis. The audit path is fully deterministic — by design, so the audit itself can be audited.

Try it on your last week of merges

The retroactive form takes about three minutes per repo:

~/your-project

$ h5i init
$ h5i notes analyze --since 7.days.ago
  scanned 38 commits · 24 had session logs · linked 24/24

$ h5i notes review --since 7.days.ago

# See the top-flagged commits from the last week.
# Open them. See if the framework agrees with the bugs you
# actually shipped.

The first time you do this, you'll find one or two flagged commits you remember as "weird, something felt off in review but I couldn't pin it." That's the framework working.

Detecting Prompt Injection in Agent Reasoning Traces

Vector 4, in depth: the eight regex rules and why scanning the trace beats scanning the output.

Make AI code review asymmetric in the right direction

h5i is open source, Apache 2.0, and runs entirely locally — no model in the audit path.

Star on GitHub Back to docs