Cutting Claude Coding-Agent Costs 51% with Content-Addressed Claims
Prompt caching solves the cost of re-sending. It does not solve the cost of re-deriving. Here's the missing layer — and the A/B benchmark that shows what happens when you stop paying tokens for facts the agent already figured out.
Every Claude Code session that touches a non-trivial codebase begins with the same expensive ceremony: the agent re-discovers the codebase. It greps for the entry point. It reads the module that probably owns the logic. It opens three more files to verify. By the time it starts answering your actual question, it may have spent a substantial part of the session touring files whose structure did not change since yesterday.
Anthropic's prompt caching is the right tool for half this problem. It lets the agent re-send the same system prompt and tool definitions at a fraction of the cost. But it does nothing about the second half: the agent still has to look at the same files in the same order to arrive at the same conclusion. The cache makes that traversal cheaper per token. It does not make the traversal unnecessary.
h5i adds the missing layer. Claims are short, content-addressed facts the agent
records once — "the retry logic lives in HttpClient::send, not the middleware" — pinned
to a Merkle hash of the files they depend on. As long as those files don't change, the claim
stays live. The next session reads it as a pre-verified fact and skips the entire
re-derivation.
The A/B benchmark
The checked-in experiment
harness seeds the same synthetic 28-file Python service for both arms and gives
claude-opus-4-7 the same HTTP-logging task. The control arm starts cold.
The treatment arm starts with up to five claims curated by one claude-haiku-4-5
setup call. The published run contains N = 5 trials per arm.
| Metric | Control (no claims) | Treatment (with claims) | Δ |
|---|---|---|---|
| Read tool calls | 6.2 ± 0.4 | 4.0 ± 0 | −35% |
| Cache-read tokens | 793,425 ± 177,592 | 511,824 ± 106,863 | −35% |
| Assistant turns | 25.6 ± 3.8 | 16.6 ± 3.1 | −35% |
| Wall time | 68s ± 16 | 57s ± 5 | −16% |
| Estimated session cost | ~$4.35 | ~$2.13 | −51% |
| Task fidelity | 5/5 | 5/5 | ✓ |
The headline is the estimated 51% reduction in per-session cost, including the Haiku setup call. The read-tool-calls result is also useful: every claims-seeded trial read exactly four files, the four HTTP modules relevant to the task. The cold arm read six or seven files while confirming the boundary. See the full results, per-trial data, and caveats.
What a claim is, mechanically
A claim has three parts: a text, a list of evidence paths, and a Merkle hash:
$ h5i claims add "retry logic lives in HttpClient::send, not middleware" \ --path src/http.rs --path src/middleware.rs ✔ Recorded claim 478be84c61e7 evidence: sha256(("src/http.rs", <blob_oid>), ("src/middleware.rs", <blob_oid>))
The evidence hash is the SHA-256 of the (path, git-blob-OID) pairs at HEAD. It's stable under whitespace-only re-saves but changes the moment the file's content changes. There is no TTL, no heuristic, no "we think this is probably still right" — git tells you whether the evidence is intact, byte for byte.
Live, stale, and the auto-invalidation flow
A claim's status is computed from the current working tree. It's live if the evidence files hash the same as when the claim was recorded; stale otherwise.
$ h5i claims list STATUS ID TEXT ● live 478be84c61e7 retry logic lives in HttpClient::send, not middleware ○ stale 9f02ab1e733c FooError::Parse only constructed in parser.rs ↳ src/parser.rs changed — evidence no longer matches ● live b40132d8e0c4 cache layer is single-tenant; no per-user keys
Stale claims are not deleted. They're held — the next agent session that reads the affected file gets a chance to re-confirm or replace the claim, often via a one-line check rather than a full re-derivation. Pruning is explicit:
$ h5i claims prune --stale-older-than 30d 3 stale claims removed
Preamble injection
The other half of the mechanism is what happens at session start. Live claims render as a
## Known facts block in the system preamble:
## Known facts (verified evidence at HEAD) - retry logic lives in HttpClient::send, not middleware evidence: src/http.rs · src/middleware.rs - cache layer is single-tenant; no per-user keys evidence: src/cache/mod.rs # Treat these as ground truth. Re-verify only if a downstream # change requires it.
The agent treats this block the way it treats CLAUDE.md instructions — high-trust context
to use, not to re-derive. In practice, Claude reads it, accepts it, and skips the exploration
phase selectively. The 4.0 ± 0 read-tool-call result above is the direct consequence:
the model reads the relevant modules without exploring decoy files.
How claims compose with prompt caching
Anthropic prompt caching and h5i claims solve different parts of the cost equation, and they compose:
- Caching reduces the per-token cost of system-prompt re-sends.
- Claims reduce the number of tokens the agent needs to read in order to answer.
In the checked-in run, claims reduced cache-read tokens by 35.5% and estimated total per-session cost by 51%, including the Haiku setup call. The exact magnitude depends on the task and codebase; the mechanism complements prompt caching by reducing re-derivation work.
Single-path claims as per-file orientations
A claim with exactly one path doubles as a per-file briefing:
$ h5i claims add "all DB writes go through Repository, never raw sqlx" \ --path src/db/mod.rs $ h5i claims list --group-by-path src/db/mod.rs ● live all DB writes go through Repository, never raw sqlx ● live connection pool capped at 32; do not raise without ops sign-off src/api/handlers/checkout.rs ● live idempotency keyed on (user_id, request_id) — never timestamps
Useful when an agent is about to edit a file: h5i context relevant src/db/mod.rs surfaces
the per-file claims plus any reasoning trace mentioning the file, so the agent loads exactly
the constraints it needs and nothing else.
When to record claims
A heuristic that produces high-value claims with low overhead:
- After exploration that paid off. If you spent five reads to find a fact, record it.
- After a non-obvious negative result. "X is not in module Y" saves another agent the same wild-goose chase.
- After a constraint discovery. Limits, contracts, invariants — the things that don't show up in the type signature.
Don't record claims for things obvious from the file (function signatures, exported names). The agent can grep for those in O(1) anyway. The valuable claims are the ones that take five tool calls to arrive at.
Run the benchmark on your own repo
The reproducible harness is
scripts/experiment_claims.sh.
It seeds its test project, runs both arms, checks fidelity, and writes raw result records:
$ cargo build $ H5I_BIN=$PWD/target/debug/h5i N_TRIALS=5 ./scripts/experiment_claims.sh CONTROL vs AUTO_HAIKU_CLAIMS raw records: /tmp/h5i-claims-exp-*-results.jsonl.filtered
Treat this as a reproducible directional result, not a universal constant. The magnitude depends on task complexity, codebase size, model behavior, and claim quality.
Stop paying tokens to re-derive what the agent already knows
h5i claims are open source, evidence is a git hash, and there's no service to subscribe to.
Star on GitHub Back to docs