Cutting Agent Token Usage 95% by Keeping Tool Output Out of Context
A 4 MB pytest log is mostly noise: the agent needs the two failures and the
summary line, not the 3,000 PASSED rows. h5i keeps the full raw output
out-of-band, hands the agent a small structured summary, and keeps the original one command
away. Here's the design — and the reproducible experiment that measures it.
Coding agents drown in their own tool output. A test run, a build, a linter, a
kubectl get -o json — each can dump thousands of lines into the context window,
and the agent pays for every one on every subsequent turn (the prompt cache makes re-sending
cheaper, not the original ingestion). The signal an agent actually needs — which test
failed, on what line, with what assertion — is usually a few lines buried in a flood of
progress bars and passing rows.
The fix is structural, and it borrows from a solved problem: large files in Git. Tools like git-annex and git-lfs split a big file into a small tracked pointer and an out-of-band blob. h5i does the same for tool output.
The design: a pointer and a blob
h5i capture run -- <command> runs the command and splits its output in two:
| Artifact | Where it lives | Size | Travels with h5i push? |
|---|---|---|---|
| Raw blob (full bytes) | .git/.h5i/objects/ab/cd/<sha256> (local) | huge | No by default — only via h5i objects push (Git LFS / git ref) |
| Manifest (pointer + structured summary) | refs/h5i/objects (git ref, JSONL) | tiny | Yes |
The agent reads only the manifest's summary. The full bytes are content-addressed by SHA-256
and one command away (h5i recall object <id>) — so the output is
reduced and losslessly recoverable at the same time. Output below a size
threshold passes straight through unstored, so wrapping any command is safe.
Why not just truncate? Truncation throws information away. The object store keeps every byte; it just moves the bulk out of the context window and leaves a durable address behind. An evicted blob degrades to a clear "absent" message, never a silent loss.
Structured output, not a log excerpt
The summary isn't a trimmed log — it's a normalized, machine-actionable result. One schema across test runners, compilers, linters, and type checkers, so an agent learns one shape instead of N. The default render is compact (one line per finding):
$ h5i capture run -- pytest -q pytest test failed · 1 failed, 120 passed (exit 1) F tests/test_auth.py::test_refresh assert 0 == 100
The agent can branch on status, jump to the location, and dedupe by
a stable fingerprint. A failing run is never reported as passed — the
status is derived from the exit code, never guessed from text — and a parser that can't find
its anchors declines to a generic result rather than inventing structure. The full
typed result is available with --format structured (YAML) or --format
json, and is stored in the manifest so captures are queryable:
$ h5i recall objects --status failed --tool pytest $ h5i recall object 0bb827e4 --summary # the reduced summary $ h5i recall object 0bb827e4 # the full raw bytes, exactly
The experiment
scripts/experiment_token_reduction.sh runs realistic tool output through
h5i capture run and measures four properties against the default agent-facing
output, tokenized with h5i's own tokenizer: token cut, signal retained, correct status, and
lossless recovery. A representative run:
| Fixture | Raw (tok) | Summary (tok) | Cut |
|---|---|---|---|
| pytest (1 fail / 124) | 1424 | 30 | 98% |
| cargo test (1 fail) | 1397 | 39 | 97% |
| tsc (2 errors) | 544 | 63 | 88% |
| go build failure | 624 | 21 | 97% |
| noisy service log (1 buried error) | 17223 | 79 | 100% |
| big JSON (402-item array) | 5625 | 61 | 99% |
| Total | 27942 | 1176 | 95.8% |
Across the matrix, ~95% of tokens never enter the context window — and in
every case the failing test, the error, and its location survive into the summary, the status
is honest, and recall object returns the raw bytes exactly.
An honest limit: signal-dense output
Token reduction works by dropping noise — so its ceiling is set by how much noise there is.
Tests, builds, and logs are mostly noise around a little signal, which is why they reduce
90–100%. A linter or type-checker with many issues is the opposite: every line is a
real diagnostic, with nothing to drop. There the compact render still beats raw (by capping
the list and dropping summary chatter), but only modestly — and the full structured YAML is
actually larger than raw, because it adds a keyed field per finding. That's why the
compact format is the default, and why --format summary exists for the
absolute-smallest text when you only want the diagnostics. Structure has a cost; h5i spends it
where it buys machine-actionability, not on already-compact output.
Sharing the raw bytes — Git LFS by default
The split has a nice consequence for teams: h5i push carries only the tiny
manifests (pointers + summaries) — the huge raw bytes never travel by
default. When you do want to share them, h5i objects push uploads them,
and the backend is chosen automatically:
--backend | Stores raw blobs in | When |
|---|---|---|
lfs | the remote's Git LFS server (by sha256) | default for HTTP(S) remotes |
git-ref | refs/h5i/objects-data (a content-addressed git ref) | fallback for SSH / file:// remotes |
auto | LFS when the remote is HTTP(S), else git-ref | the default |
The LFS backend is native — h5i speaks the LFS Batch API directly (no
git lfs CLI, no pointer files), reusing your git host's LFS storage and auth, and
keyed by the same sha256 as everything else. So huge tool output lives on the LFS server and
never bloats the git object database. With LFS, h5i recall object <id>
lazily fetches a blob from the server only when it's actually needed — and
every transfer is content-address-verified, so a tampered or mismatched blob is rejected, never
cached. Credentials are scoped to the LFS host, never leaked to presigned transfer URLs.
It shows up in the PR, too
When a branch has captures, h5i share pr post folds in a one-line scorecard —
how many tokens of raw tool output the agent kept out of context on this branch, with a
per-tool breakdown — so reviewers can see the saving without running anything.
Try it
# wrap the noisy commands an agent runs h5i capture run -- pytest -q h5i capture run -- cargo build h5i capture run --file src/auth.rs -- pytest tests/test_auth.py # come back to anything later — by status, tool, branch, or file h5i recall objects --status failed h5i recall object <id> # full raw output, byte-for-byte # share the raw blobs with the team (Git LFS by default on HTTP(S) remotes) h5i objects push # h5i push already carried the summaries h5i objects pull # fetch shared blobs you don't have locally # wire it into a project so agents do it automatically h5i objects setup
In Claude Code the same behavior is a native MCP tool (h5i_capture_run) that
returns the structured result directly — no shell, no quoting. The declarative filter engine
(ported from rtk, Apache-2.0) and the
log line-folding (from headroom,
Apache-2.0) cover the long tail of commands; h5i adds what they don't: the raw stays
content-addressed, recoverable, and queryable across captures and machines.
Stop paying for tool output your agent reads once and forgets
h5i is open source, the store is a content-addressed directory + a git ref, and there's no service to subscribe to.
Star on GitHub Back to docs