Benchmark · 2026-06-05

Cutting Agent Token Usage 95% by Keeping Tool Output Out of Context

A 4 MB pytest log is mostly noise: the agent needs the two failures and the summary line, not the 3,000 PASSED rows. h5i keeps the full raw output out-of-band, hands the agent a small structured summary, and keeps the original one command away. Here's the design — and the reproducible experiment that measures it.

Coding agents drown in their own tool output. A test run, a build, a linter, a kubectl get -o json — each can dump thousands of lines into the context window, and the agent pays for every one on every subsequent turn (the prompt cache makes re-sending cheaper, not the original ingestion). The signal an agent actually needs — which test failed, on what line, with what assertion — is usually a few lines buried in a flood of progress bars and passing rows.

The fix is structural, and it borrows from a solved problem: large files in Git. Tools like git-annex and git-lfs split a big file into a small tracked pointer and an out-of-band blob. h5i does the same for tool output.

The design: a pointer and a blob

h5i capture run -- <command> runs the command and splits its output in two:

ArtifactWhere it livesSizeTravels with h5i push?
Raw blob (full bytes).git/.h5i/objects/ab/cd/<sha256> (local)hugeNo by default — only via h5i objects push (Git LFS / git ref)
Manifest (pointer + structured summary)refs/h5i/objects (git ref, JSONL)tinyYes

The agent reads only the manifest's summary. The full bytes are content-addressed by SHA-256 and one command away (h5i recall object <id>) — so the output is reduced and losslessly recoverable at the same time. Output below a size threshold passes straight through unstored, so wrapping any command is safe.

Why not just truncate? Truncation throws information away. The object store keeps every byte; it just moves the bulk out of the context window and leaves a durable address behind. An evicted blob degrades to a clear "absent" message, never a silent loss.

Structured output, not a log excerpt

The summary isn't a trimmed log — it's a normalized, machine-actionable result. One schema across test runners, compilers, linters, and type checkers, so an agent learns one shape instead of N. The default render is compact (one line per finding):

$ h5i capture run -- pytest -q
pytest test failed · 1 failed, 120 passed (exit 1)
  F tests/test_auth.py::test_refresh  assert 0 == 100

The agent can branch on status, jump to the location, and dedupe by a stable fingerprint. A failing run is never reported as passed — the status is derived from the exit code, never guessed from text — and a parser that can't find its anchors declines to a generic result rather than inventing structure. The full typed result is available with --format structured (YAML) or --format json, and is stored in the manifest so captures are queryable:

$ h5i recall objects --status failed --tool pytest
$ h5i recall object 0bb827e4 --summary     # the reduced summary
$ h5i recall object 0bb827e4               # the full raw bytes, exactly

The experiment

scripts/experiment_token_reduction.sh runs realistic tool output through h5i capture run and measures four properties against the default agent-facing output, tokenized with h5i's own tokenizer: token cut, signal retained, correct status, and lossless recovery. A representative run:

FixtureRaw (tok)Summary (tok)Cut
pytest (1 fail / 124)14243098%
cargo test (1 fail)13973997%
tsc (2 errors)5446388%
go build failure6242197%
noisy service log (1 buried error)1722379100%
big JSON (402-item array)56256199%
Total27942117695.8%

Across the matrix, ~95% of tokens never enter the context window — and in every case the failing test, the error, and its location survive into the summary, the status is honest, and recall object returns the raw bytes exactly.

An honest limit: signal-dense output

Token reduction works by dropping noise — so its ceiling is set by how much noise there is. Tests, builds, and logs are mostly noise around a little signal, which is why they reduce 90–100%. A linter or type-checker with many issues is the opposite: every line is a real diagnostic, with nothing to drop. There the compact render still beats raw (by capping the list and dropping summary chatter), but only modestly — and the full structured YAML is actually larger than raw, because it adds a keyed field per finding. That's why the compact format is the default, and why --format summary exists for the absolute-smallest text when you only want the diagnostics. Structure has a cost; h5i spends it where it buys machine-actionability, not on already-compact output.

Sharing the raw bytes — Git LFS by default

The split has a nice consequence for teams: h5i push carries only the tiny manifests (pointers + summaries) — the huge raw bytes never travel by default. When you do want to share them, h5i objects push uploads them, and the backend is chosen automatically:

--backendStores raw blobs inWhen
lfsthe remote's Git LFS server (by sha256)default for HTTP(S) remotes
git-refrefs/h5i/objects-data (a content-addressed git ref)fallback for SSH / file:// remotes
autoLFS when the remote is HTTP(S), else git-refthe default

The LFS backend is native — h5i speaks the LFS Batch API directly (no git lfs CLI, no pointer files), reusing your git host's LFS storage and auth, and keyed by the same sha256 as everything else. So huge tool output lives on the LFS server and never bloats the git object database. With LFS, h5i recall object <id> lazily fetches a blob from the server only when it's actually needed — and every transfer is content-address-verified, so a tampered or mismatched blob is rejected, never cached. Credentials are scoped to the LFS host, never leaked to presigned transfer URLs.

It shows up in the PR, too

When a branch has captures, h5i share pr post folds in a one-line scorecard — how many tokens of raw tool output the agent kept out of context on this branch, with a per-tool breakdown — so reviewers can see the saving without running anything.

Try it

# wrap the noisy commands an agent runs
h5i capture run -- pytest -q
h5i capture run -- cargo build
h5i capture run --file src/auth.rs -- pytest tests/test_auth.py

# come back to anything later — by status, tool, branch, or file
h5i recall objects --status failed
h5i recall object <id>        # full raw output, byte-for-byte

# share the raw blobs with the team (Git LFS by default on HTTP(S) remotes)
h5i objects push               # h5i push already carried the summaries
h5i objects pull               # fetch shared blobs you don't have locally

# wire it into a project so agents do it automatically
h5i objects setup

In Claude Code the same behavior is a native MCP tool (h5i_capture_run) that returns the structured result directly — no shell, no quoting. The declarative filter engine (ported from rtk, Apache-2.0) and the log line-folding (from headroom, Apache-2.0) cover the long tail of commands; h5i adds what they don't: the raw stays content-addressed, recoverable, and queryable across captures and machines.

Stop paying for tool output your agent reads once and forgets

h5i is open source, the store is a content-addressed directory + a git ref, and there's no service to subscribe to.

Star on GitHub Back to docs