Cornerstone · 2026-06-19

Why AI agents need auditable workspaces

Git tracks the diff. h5i tracks the workspace behind it. An AI coding agent does far more than write the lines you finally merge, it reads files, runs commands, follows a prompt, weighs alternatives, hands work to other agents, and reaches out to the network. Git records none of that. An auditable workspace does: it is the place the agent works, and everything it does there is recorded in your repo and provable after the fact.

By Koukyosyumei Reading time 12 min Tags Auditable Workspaces · Provenance · Confinement · Governance

You hand a task to Claude Code or Codex. Twenty minutes later there is a branch with a tidy diff. The diff looks fine. But the diff is the last thing that happened, the visible residue of a long operational session you never saw. What was the agent actually asked? Which files did it read and which did it ignore? What commands did it run, and what did they print? Could it have reached your secrets or the public internet? Which model produced this, steered by whose prompt? When a second agent picked up the work, what did the handoff say?

Git answers none of those questions, because Git was designed to version code, not to record the work that produced the code. For human authors that gap never mattered, the work lived in a person's head and their terminal, and we trusted the author. For autonomous agents the gap is the whole problem. The fix is not a better diff viewer. It is to make the workspace itself the unit of record.

The definition

An auditable workspace is the place an AI agent does its work, a Git-backed worktree where every prompt, decision, command, log, policy, and handoff is recorded in your repo and provable after the fact.

Concretely, the workspace is the sum of everything that went into a change, not just the change:

Workspace = worktree + prompt + model + commands + logs + policy + messages + PR evidence.

Two cautions are built into that wording. "Workspace" is a crowded word, the defense is the adjective auditable and the location in your repo. And no single feature is the identity: token reduction, a prompt score, a messaging channel are all properties of the workspace, never the point. The point is that the place an agent worked becomes a record you can replay, review, and trust.

The problem: agents do operational work Git never records

A commit is a snapshot of files plus an author, a timestamp, and a message. That model is a perfect fit for "a person deliberately saved this state." It is a poor fit for "an autonomous process spent an hour reading, deciding, executing, and negotiating, and this diff fell out the end." All of the high-signal, high-risk material, the intent, the tool output, the reach, the reasoning, happens off-ledger.

That off-ledger work is exactly what a reviewer needs and exactly what an incident responder wishes they had. "The agent refactored billing" is reassuring until you learn it was prompted with "make it work", ran a destructive migration you can't see, and could reach the production database the whole time. None of that is in the diff. It was in the workspace, and the workspace evaporated.

Three proof pillars

"Auditable" is not a vibe; it decomposes into three concrete properties of the workspace. Each one answers a question the diff can't, and each is backed by data that lives in your Git refs.

Pillar	Answers	Backed by
Provenance	Who asked, why, and what the agent knew.	`refs/h5i/notes`, `refs/h5i/context`
Confinement	What it couldn't reach, provable.	`refs/h5i/env`
Governance	Deterministic audit & compliance, no model in the loop.	`h5i audit`

The middle pillar is the one teams underrate. "The agent physically could not exfiltrate" beats "we logged it" every time, a log tells you what happened, a boundary tells you what could never happen. h5i's sandboxed worktree (h5i env) enforces that boundary with tiered isolation and a network egress allowlist, then records the policy alongside the evidence so the confinement itself is auditable.

The workspace stack

Put the pillars together and a workspace has a stack, each layer a command you actually run, each layer recording a different slice of the work:

Agent Workspace

Agent Workspace
├─ Sandboxed worktree        h5i env
├─ Prompt-aware commits      h5i capture commit
├─ Compressed tool logs      h5i capture run
├─ Agent handoffs            h5i msg
├─ Risk/audit signals        h5i audit
└─ PR evidence brief         h5i share pr

The sandboxed worktree is the hero, not feature number seven. It is the canonical auditable workspace and the command humans actually type. Everything above it accrues onto that worktree as the agent works.

The golden path

The pieces compose into one user journey, the same ordering everywhere:

golden path

env create  →  agent works (env shell / capture run)  →  capture commit (provenance)
            →  msg review  →  share pr  →  apply

In narrative form: open a workspace, hand it to an agent, let evidence accrue (commands, logs, prompts, handoffs), have a reviewer read the provable record on the PR, and the human applies or merges. The agent never touches your branch directly, it works inside a confined worktree, and a person applies its work after reading the evidence.

The moat: it lives in your Git

Everything above lives under refs/h5i/*, so it travels with the repository like any other Git data. No SaaS, no lock-in, works offline. Clone the repo and you have the workspaces; there is no separate service holding your audit trail hostage and no account to log into to find out what an agent did. The evidence is as durable and portable as the commits it explains.

That is also why the token-reduction claim ladders up to "auditable" rather than standing on its own: the workspace keeps raw logs out of the agent's context window, recoverable, not discarded, which is how it gets to ~95% lower token waste while still being able to show you the full output later.

What it is not

h5i is not a Git replacement, not a hosted SaaS, not just a sandbox, it's a Git sidecar for auditable agent workspaces.

It does not replace Git; it rides alongside it. It is not a cloud sandbox like a hosted dev environment, the confinement runs on your machine and the record lives in your repo. And it is not merely a sandbox: a sandbox confines, but an auditable workspace confines and records and makes the whole thing reviewable on the PR. Confinement is one of the three pillars, not the product.

Why this is the right unit

Code review evolved for a world where a trusted human authored every line and could answer for it in a hallway conversation. Agents broke that assumption: the author is a process, it can't be cross-examined later, and it operated with real capability over your machine and network. The honest response is not to trust harder or log more, it is to make the agent's entire workspace a first-class, Git-backed, provable artifact. Grade the work by the record it leaves, confine what it can reach, and let a human apply it after reading the evidence.

An Auditable Sandbox for AI Agents

The hero of the stack in depth: a confined, disposable worktree with tiered isolation and a rootless egress allowlist a raw socket can't bypass, then audit everything before it touches your branch.

Try h5i on your next AI-assisted branch

Create a sandboxed workspace, capture the run, and post a review-ready PR brief.

Star on GitHub Back to docs