How much does keeping tool output out of context cut tokens?

About 95% across a mixed matrix of test runs, builds, JSON payloads, and noisy logs. The cut is set by how much noise surrounds the signal: tests and logs are mostly noise and reduce 90–100%, while a linter with many real diagnostics has little to drop and reduces far less.

Is the raw output lost when h5i reduces it?

No. The full bytes are content-addressed by SHA-256 and written to a local store at .git/.h5i/objects/ab/cd/ . The agent reads only a small structured summary; the raw is one command away with h5i recall object , returned byte-for-byte.

How is this different from truncating or tailing a log?

Truncation discards information. The object store keeps every byte and just moves the bulk out of the context window, leaving a durable, queryable address behind. An evicted blob degrades to a clear absent message, never a silent loss, and captures stay searchable across runs.

Do the raw blobs get pushed to my Git remote?

Not by default. h5i share push carries only the tiny manifests (pointers plus summaries). When you want to share raw bytes, h5i objects push uploads them via the native Git LFS Batch API on HTTP(S) remotes, falling back to a content-addressed git ref for SSH or file:// remotes.

Does token-reducing capture work with both Claude Code and Codex?

Yes. The optional wrap-bash hook (h5i hook setup --write --wrap-bash) rewrites the agent's Bash commands into h5i capture run for both Claude Code and Codex (--target codex), so the agent gets the reduced summary automatically while the raw bytes are stored for recall.

Benchmark · 2026-06-05

Cutting Agent Token Usage 95% by Keeping Tool Output Out of Context

A 4 MB pytest log is mostly noise: the agent needs the two failures and the summary line, not the 3,000 PASSED rows. h5i keeps the full raw output out-of-band, hands the agent a small structured summary, and keeps the original one command away. Here's the design, and the reproducible experiment that measures it.

By Koukyosyumei Reading time 8 min Tags Token reduction · Context window · Tooling

Key takeaways

The win is the split: a tiny typed summary in the agent's context plus the full raw bytes as a durable, content-addressed record.
The ~95% reduction is real but bounded by noise ratio — signal-dense diagnostic output reduces far less.
Content-addressing gives dedupe across runs, conflict-free set-union merge, and fetch-time verification for free.

The auditable workspace keeps raw logs out of context, recoverable on demand, and this post explains exactly how. Coding agents drown in their own tool output. A test run, a build, a linter, a kubectl get -o json, each can dump thousands of lines into the context window, and the agent pays for every one on every subsequent turn (the prompt cache makes re-sending cheaper, not the original ingestion). The signal an agent actually needs, which test failed, on what line, with what assertion, is usually a few lines buried in a flood of progress bars and passing rows.

The fix is structural, and it borrows from a solved problem: large files in Git. Tools like git-annex and git-lfs split a big file into a small tracked pointer and an out-of-band blob. h5i does the same for tool output.

The design: a pointer and a blob

h5i capture run -- <command> runs the command and splits its output in two:

Artifact	Where it lives	Size	Travels with `h5i share push`?
Raw blob (full bytes)	`.git/.h5i/objects/ab/cd/<sha256>` (local)	huge	No by default, only via `h5i objects push` (Git LFS / git ref)
Manifest (pointer + structured summary)	`refs/h5i/objects` (git ref, JSONL)	tiny	Yes

Who runs this? In practice the agent does, not a human. With the optional wrap-bash hook installed (h5i hook setup --write --wrap-bash), a PreToolUse handler transparently rewrites every Bash command the agent runs into h5i capture run … — so the agent gets the token-reduced summary automatically while the full raw bytes are stored for h5i recall. This wiring works for both Claude Code and Codex (--target codex writes the same PreToolUse / Bash hook into .codex/config.toml). You can of course also run h5i capture run by hand.

The agent reads only the manifest's summary. The full bytes are content-addressed by SHA-256 and one command away (h5i recall object <id>), so the output is reduced and losslessly recoverable at the same time. Output below 2 KB (the default DEFAULT_CAPTURE_MIN_BYTES = 2048) on a successful command passes straight through unstored — a failing command is always captured regardless of size — so wrapping anything is safe.

What content-addressing buys you

The blob's name is its SHA-256. The local store is sharded by the first four hex digits — .git/.h5i/objects/<ab>/<cd>/<full-hex> — so a directory never fills with millions of siblings. Three properties fall out of this for free:

Dedupe across runs. Writing a blob whose hash already exists is a no-op (if path.is_file() { return Ok(()) }). Re-run the same failing test ten times and the identical output is stored once; the manifest records ten captures pointing at one blob.
Merge is set union. Two clones that capture different output can be merged by taking the union of blobs — there are no conflicts when names are derived from content.
Verification is built in. A fetched blob is checked against the hash you asked for; a tampered or truncated transfer is rejected, never cached. The summary you read and the bytes you rehydrate provably came from the same run.

Why not just truncate? Truncation throws information away. The object store keeps every byte; it just moves the bulk out of the context window and leaves a durable address behind. An evicted blob degrades to a clear "absent" message, never a silent loss.

Structured output, not a log excerpt

The summary isn't a trimmed log, it's a normalized, machine-actionable result — one schema across test runners, compilers, linters, and type checkers, so an agent learns one shape instead of N. The default render is compact (one line per finding):

$ h5i capture run -- pytest -q
pytest test failed · 1 failed, 120 passed (exit 1)
  F tests/test_auth.py::test_refresh  assert 0 == 100

The agent can branch on status, jump to the location, and dedupe by a stable fingerprint. A failing run is never reported as passed, the status is derived from the exit code, never guessed from text, and a parser that can't find its anchors declines to a generic result rather than inventing structure. The full typed result is available with --format structured (YAML) or --format json, and is stored in the manifest so captures are queryable:

$ h5i recall objects --status failed --tool pytest
$ h5i recall object 0bb827e4 --summary     # the reduced summary
$ h5i recall object 0bb827e4               # the full raw bytes, exactly

What's measured, and how to reproduce it

The reduction isn't a marketing number — it's a property the test suite enforces. tests/filter_quality.rs (run it yourself with cargo test --test filter_quality) pushes realistic fixtures — a pytest failure, an all-pass run, a cargo test panic, a noisy service log with one buried error, a large JSON payload, a gcc error wall — through the same filter the CLI uses, then asserts three things on each: the summary stays under a fixed fraction of the raw tokens (the pytest-failure fixture must land under 35% of raw, an all-pass run under 5%), the signal survives (the failing test, the panic, the buried error all appear in the summary), and the summary never inflates the token count. Token counts use the same tiktoken-based counter h5i records in every manifest, so the numbers match what an agent actually pays.

Tokenizing a representative capture run across common tools the same way gives the agent-facing vs. raw comparison below:

Fixture	Raw (tok)	Summary (tok)	Cut
pytest (1 fail / 124)	1424	30	98%
cargo test (1 fail)	1397	39	97%
tsc (2 errors)	544	63	88%
go build failure	624	21	97%
noisy service log (1 buried error)	17223	79	100%
big JSON (402-item array)	5625	61	99%
Total	27942	1176	95.8%

Across the matrix, ~95% of tokens never enter the context window, and in every case the failing test, the error, and its location survive into the summary, the status is honest, and recall object returns the raw bytes exactly.

An honest limit: signal-dense output

Token reduction works by dropping noise, so its ceiling is set by how much noise there is. Tests, builds, and logs are mostly noise around a little signal, which is why they reduce 90–100%. A linter or type-checker with many issues is the opposite: every line is a real diagnostic, with nothing to drop. There the compact render still beats raw (by capping the list and dropping summary chatter), but only modestly, and the full structured YAML is actually larger than raw, because it adds a keyed field per finding. That's why the compact format is the default, and why --format summary exists for the absolute-smallest text when you only want the diagnostics. Structure has a cost; h5i spends it where it buys machine-actionability, not on already-compact output.

How this compares to the alternatives

Keeping output out of context is not a new instinct — most teams already do something. The difference is what survives. Lay the common approaches side by side and the axis that matters isn't "how few tokens" but "what can you get back later":

Approach	Tokens in context	Signal kept	Raw recoverable	Queryable later
Raw logs piped to the agent	full (every turn)	yes, but buried	—	no
`head`/`tail`/truncate	bounded	only if it sits at the ends	no (discarded)	no
Standalone filter (rtk, headroom)	reduced	yes (rule-matched)	no (filtered text is the output)	no
`h5i capture run`	reduced ~95%	yes (typed result)	yes (content-addressed)	yes (`recall objects`/`search`)

Good: the text-filter tools h5i builds on — rtk's declarative TOML filter engine and headroom's log line-folding (both Apache-2.0) — are well-tuned at exactly the job of squeezing a noisy stream down to its diagnostics, and h5i reuses them rather than reinventing that work. Gap: a standalone filter's reduced text is also its only output — once the bulk is gone it's gone, and nothing indexes yesterday's run. Naive truncation is worse: it bounds tokens but bets the signal lives at the head or tail, which a buried error violates. h5i's contribution is the second half — the same reduction, but the raw stays addressable and the captures stay searchable across runs and machines.

Sharing the raw bytes: Git LFS by default

The split has a nice consequence for teams: h5i share push carries only the tiny manifests (pointers + summaries), the huge raw bytes never travel by default. When you do want to share them, h5i objects push uploads them, and the backend is chosen automatically:

`--backend`	Stores raw blobs in	When
`lfs`	the remote's Git LFS server (by sha256)	default for HTTP(S) remotes
`git-ref`	`refs/h5i/objects-data` (a content-addressed git ref)	fallback for SSH / `file://` remotes
`auto`	LFS when the remote is HTTP(S), else git-ref	the default

The LFS backend is native, h5i speaks the LFS Batch API directly (no git lfs CLI, no pointer files), reusing your git host's LFS storage and auth, and keyed by the same sha256 as everything else. So huge tool output lives on the LFS server and never bloats the git object database. With LFS, h5i recall object <id> lazily fetches a blob from the server only when it's actually needed, and every transfer is content-address-verified, so a tampered or mismatched blob is rejected, never cached. Credentials are scoped to the LFS host, never leaked to presigned transfer URLs.

When recovery can fail (and how it fails)

A recoverable record is only as good as its failure behavior. The honest cases:

The blob is gone locally and was never shared. If you garbage-collect the local store, or move to a machine that only pulled the manifests, recall object reports a clear "absent" result — it never fabricates output or silently returns a stale summary as if it were the raw.
The remote doesn't speak LFS. The auto backend only chooses LFS for HTTP(S) remotes and falls back to a content-addressed git ref for SSH or file://. If you force --backend lfs at a host without it, the Batch endpoint returns 404/501 and h5i raises an explicit "remote does not speak LFS" error rather than appearing to succeed.
You're offline. The lazy LFS fetch is best-effort network, so rehydrating a not-yet-local blob fails loudly when the server is unreachable — but the manifest summary, status, and findings still read fine offline, because they live in the git ref you already pulled.

The common thread: the working context (summary) and the durable record (raw) fail independently. Losing access to the bytes never corrupts the summary, and a missing blob is always an explicit absence, never a quiet substitution.

It shows up in the PR, too

When a branch has captures, h5i share pr post folds in a one-line scorecard, how many tokens of raw tool output the agent kept out of context on this branch, with a per-tool breakdown, so reviewers can see the saving without running anything.

Try it

These are the commands the agent runs; with the wrap-bash hook installed (h5i hook setup --write --wrap-bash) they're invoked automatically on every Bash call, so the agent never has to remember to wrap anything — but each works typed by hand too:

# wrap the noisy commands an agent runs
h5i capture run -- pytest -q
h5i capture run -- cargo build
h5i capture run --file src/auth.rs -- pytest tests/test_auth.py

# come back to anything later — by status, tool, branch, or file
h5i recall objects --status failed
h5i recall object <id>        # full raw output, byte-for-byte

# share the raw blobs with the team (Git LFS by default on HTTP(S) remotes)
h5i objects push               # h5i share push already carried the summaries
h5i objects pull               # fetch shared blobs you don't have locally

# wire it into a project so agents do it automatically
h5i objects setup

In Claude Code the same behavior is a native MCP tool (h5i_capture_run) that returns the structured result directly, no shell, no quoting. The declarative filter engine (ported from rtk, Apache-2.0) and the log line-folding (from headroom, Apache-2.0) cover the long tail of commands; h5i adds what they don't: the raw stays content-addressed, recoverable, and queryable across captures and machines.

Frequently asked questions

How much does this actually cut? About 95% across a mixed matrix of test runs, builds, JSON payloads, and noisy logs. The ceiling is set by how much noise surrounds the signal: tests and logs reduce 90–100%; a linter where every line is a real diagnostic reduces far less, which the honest-limit section above spells out.

Is the raw output lost? No. The full bytes are content-addressed by SHA-256 under .git/.h5i/objects/ab/cd/<sha256> and returned byte-for-byte by h5i recall object <id>. The agent just doesn't carry them in context.

How is it different from truncating a log? Truncation discards; the object store moves the bulk out of context but keeps a durable, queryable address. An evicted blob degrades to a clear "absent" message, never a silent loss.

Do raw blobs get pushed to my Git remote? Not by default — h5i share push carries only manifests. h5i objects push uploads the bytes via the native Git LFS Batch API on HTTP(S) remotes, falling back to a content-addressed git ref for SSH or file://.

Does it work with Codex as well as Claude Code? Yes. The wrap-bash hook (h5i hook setup --write --wrap-bash, or --target codex) rewrites the agent's Bash into h5i capture run for both.

The point isn't smaller logs

It's worth being precise about what changed. The reduction number is real and reproducible, but it's the consequence, not the design. The design is a split: the agent's working context becomes a tiny typed summary it can branch on, while the durable record stays the full raw bytes — content-addressed, deduplicated, verified on fetch, and recoverable on demand. Those two halves fail independently, travel separately (manifests by default, bytes only when you ask), and answer different questions: the summary tells the agent what happened now; the store lets a reviewer, a later session, or another machine ask what happened then. Filtering logs saves tokens. Separating the context from the record is what makes a noisy tool run both cheap to read and impossible to lose.

Guide: keep tool output out of your agent's context

A task-oriented walkthrough of h5i capture run, recall, GC, trust, and project setup.

Try h5i on your next AI-assisted branch

Create a sandboxed workspace, capture the run, and post a review-ready PR brief. h5i is open source, the store is a content-addressed directory + a git ref, and there's no service to subscribe to.

Star on GitHub Back to docs