Why AI agents need auditable workspaces
Git tracks the diff. h5i tracks the workspace behind it. An AI coding agent does far more than write the lines you finally merge, it reads files, runs commands, follows a prompt, weighs alternatives, hands work to other agents, and reaches out to the network. Git records none of that. An auditable workspace does: it is the place the agent works, and everything it does there is recorded in your repo and provable after the fact.
You hand a task to Claude Code or Codex. Twenty minutes later there is a branch with a tidy diff. The diff looks fine. But the diff is the last thing that happened, the visible residue of a long operational session you never saw. What was the agent actually asked? Which files did it read and which did it ignore? What commands did it run, and what did they print? Could it have reached your secrets or the public internet? Which model produced this, steered by whose prompt? When a second agent picked up the work, what did the handoff say?
Git answers none of those questions, because Git was designed to version code, not to record the work that produced the code. For human authors that gap never mattered, the work lived in a person's head and their terminal, and we trusted the author. For autonomous agents the gap is the whole problem. The fix is not a better diff viewer. It is to make the workspace itself the unit of record.
The definition
Concretely, the workspace is the sum of everything that went into a change, not just the change:
Two cautions are built into that wording. "Workspace" is a crowded word, the defense is the adjective auditable and the location in your repo. And no single feature is the identity: token reduction, a prompt score, a messaging channel are all properties of the workspace, never the point. The point is that the place an agent worked becomes a record you can replay, review, and trust.
The problem: agents do operational work Git never records
A commit is a snapshot of files plus an author, a timestamp, and a message. That model is a perfect fit for "a person deliberately saved this state." It is a poor fit for "an autonomous process spent an hour reading, deciding, executing, and negotiating, and this diff fell out the end." All of the high-signal, high-risk material, the intent, the tool output, the reach, the reasoning, happens off-ledger.
That off-ledger work is exactly what a reviewer needs and exactly what an incident responder wishes they had. "The agent refactored billing" is reassuring until you learn it was prompted with "make it work", ran a destructive migration you can't see, and could reach the production database the whole time. None of that is in the diff. It was in the workspace, and the workspace evaporated.
Three proof pillars
"Auditable" is not a vibe; it decomposes into three concrete properties of the workspace. Each one answers a question the diff can't, and each is backed by data that lives in your Git refs.
| Pillar | Answers | Backed by |
|---|---|---|
| Provenance | Who asked, why, and what the agent knew. | refs/h5i/notes, refs/h5i/context |
| Confinement | What it couldn't reach, provable. | refs/h5i/env |
| Governance | Deterministic audit & compliance, no model in the loop. | h5i audit |
The middle pillar is the one teams underrate. "The agent physically could not exfiltrate"
beats "we logged it" every time, a log tells you what happened, a boundary tells you
what could never happen. h5i's sandboxed worktree (h5i env) enforces that boundary
with tiered isolation and a network egress allowlist, then records the policy alongside the
evidence so the confinement itself is auditable.
The workspace stack
Put the pillars together and a workspace has a stack, each layer a command you actually run, each layer recording a different slice of the work:
Agent Workspace ├─ Sandboxed worktree h5i env ├─ Prompt-aware commits h5i capture commit ├─ Compressed tool logs h5i capture run ├─ Agent handoffs h5i msg ├─ Risk/audit signals h5i audit └─ PR evidence brief h5i share pr
The sandboxed worktree is the hero, not feature number seven. It is the canonical auditable workspace and the command humans actually type. Everything above it accrues onto that worktree as the agent works.
The golden path
The pieces compose into one user journey, the same ordering everywhere:
env create → agent works (env shell / capture run) → capture commit (provenance) → msg review → share pr → apply
In narrative form: open a workspace, hand it to an agent, let evidence accrue (commands, logs, prompts, handoffs), have a reviewer read the provable record on the PR, and the human applies or merges. The agent never touches your branch directly, it works inside a confined worktree, and a person applies its work after reading the evidence.
The moat: it lives in your Git
Everything above lives under refs/h5i/*, so it travels with the repository like any
other Git data. No SaaS, no lock-in, works offline. Clone the repo and you have
the workspaces; there is no separate service holding your audit trail hostage and no account to
log into to find out what an agent did. The evidence is as durable and portable as the commits
it explains.
That is also why the token-reduction claim ladders up to "auditable" rather than standing on its own: the workspace keeps raw logs out of the agent's context window, recoverable, not discarded, which is how it gets to ~95% lower token waste while still being able to show you the full output later.
What it is not
It does not replace Git; it rides alongside it. It is not a cloud sandbox like a hosted dev environment, the confinement runs on your machine and the record lives in your repo. And it is not merely a sandbox: a sandbox confines, but an auditable workspace confines and records and makes the whole thing reviewable on the PR. Confinement is one of the three pillars, not the product.
Why this is the right unit
Code review evolved for a world where a trusted human authored every line and could answer for it in a hallway conversation. Agents broke that assumption: the author is a process, it can't be cross-examined later, and it operated with real capability over your machine and network. The honest response is not to trust harder or log more, it is to make the agent's entire workspace a first-class, Git-backed, provable artifact. Grade the work by the record it leaves, confine what it can reach, and let a human apply it after reading the evidence.
Try h5i on your next AI-assisted branch
Create a sandboxed workspace, capture the run, and post a review-ready PR brief.
Star on GitHub Back to docs