Cutting Claude API Token Costs 77% with Content-Addressed Claims
Prompt caching solves the cost of re-sending. It does not solve the cost of re-deriving. Here's the missing layer — and the A/B benchmark that shows what happens when you stop paying tokens for facts the agent already figured out.
Every Claude Code session that touches a non-trivial codebase begins with the same expensive ceremony: the agent re-discovers the codebase. It greps for the entry point. It reads the module that probably owns the logic. It opens three more files to verify. By the time it starts answering your actual question, you've burned 500,000 input tokens on a tour of files whose structure didn't change since yesterday.
Anthropic's prompt caching is the right tool for half this problem. It lets the agent re-send the same system prompt and tool definitions at a fraction of the cost. But it does nothing about the second half: the agent still has to look at the same files in the same order to arrive at the same conclusion. The cache makes that traversal cheaper per token. It does not make the traversal unnecessary.
h5i adds the missing layer. Claims are short, content-addressed facts the agent
records once — "the retry logic lives in HttpClient::send, not the middleware" — pinned
to a Merkle hash of the files they depend on. As long as those files don't change, the claim
stays live. The next session reads it as a pre-verified fact and skips the entire
re-derivation.
The A/B benchmark
The setup: a real Rust codebase, an identical task across both arms ("locate the retry logic and add jitter"), N = 10 trials per arm, claude-sonnet-4-6, no model temperature shenanigans. The control arm starts each session cold. The treatment arm starts with three pre-recorded claims pointing at the relevant files.
| Metric | Control (no claims) | Treatment (with claims) | Δ |
|---|---|---|---|
| Read tool calls | 5.6 ± 1.0 | 1.0 ± 0 | −82% |
| Cache-read tokens | 510,284 | 117,433 | −77% |
| Assistant turns | 17.1 ± 1.8 | 4.8 ± 1.2 | −72% |
| Wall time | 52s ± 9 | 18s ± 5 | −65% |
| Task fidelity | 9/10 | 10/10 | ✓ |
The headline is the 77% reduction in cache-read tokens, which is the cost line that actually shows up on the Anthropic invoice. But the read-tool-calls number is the more interesting result: every treated trial read exactly one file (σ = 0). That's not a token-saving trick, that's a behavioral change. The agent stopped exploring because there was nothing left to explore.
What a claim is, mechanically
A claim has three parts: a text, a list of evidence paths, and a Merkle hash:
$ h5i claims add "retry logic lives in HttpClient::send, not middleware" \ --path src/http.rs --path src/middleware.rs ✔ Recorded claim 478be84c61e7 evidence: sha256(("src/http.rs", <blob_oid>), ("src/middleware.rs", <blob_oid>))
The evidence hash is the SHA-256 of the (path, git-blob-OID) pairs at HEAD. It's stable under whitespace-only re-saves but changes the moment the file's content changes. There is no TTL, no heuristic, no "we think this is probably still right" — git tells you whether the evidence is intact, byte for byte.
Live, stale, and the auto-invalidation flow
A claim's status is computed from the current working tree. It's live if the evidence files hash the same as when the claim was recorded; stale otherwise.
$ h5i claims list STATUS ID TEXT ● live 478be84c61e7 retry logic lives in HttpClient::send, not middleware ○ stale 9f02ab1e733c FooError::Parse only constructed in parser.rs ↳ src/parser.rs changed — evidence no longer matches ● live b40132d8e0c4 cache layer is single-tenant; no per-user keys
Stale claims are not deleted. They're held — the next agent session that reads the affected file gets a chance to re-confirm or replace the claim, often via a one-line check rather than a full re-derivation. Pruning is explicit:
$ h5i claims prune --stale-older-than 30d 3 stale claims removed
Preamble injection
The other half of the mechanism is what happens at session start. Live claims render as a
## Known facts block in the system preamble:
## Known facts (verified evidence at HEAD) - retry logic lives in HttpClient::send, not middleware evidence: src/http.rs · src/middleware.rs - cache layer is single-tenant; no per-user keys evidence: src/cache/mod.rs # Treat these as ground truth. Re-verify only if a downstream # change requires it.
The agent treats this block the way it treats CLAUDE.md instructions — high-trust context
to use, not to re-derive. In practice, Claude reads it, accepts it, and skips the exploration
phase entirely. The 1.0 ± 0 read-tool-call result above is the direct consequence: the model
has nothing left to look up.
How claims compose with prompt caching
Anthropic prompt caching and h5i claims solve different parts of the cost equation, and they compose:
- Caching reduces the per-token cost of system-prompt re-sends.
- Claims reduce the number of tokens the agent needs to read in order to answer.
The 77% number above is reduction in cache-read tokens — meaning, even after caching is already doing its job, claims cut the remaining bill by another factor of ~4×. The two are multiplicative. If you're using one without the other, you're leaving cost on the table.
Single-path claims as per-file orientations
A claim with exactly one path doubles as a per-file briefing:
$ h5i claims add "all DB writes go through Repository, never raw sqlx" \ --path src/db/mod.rs $ h5i claims list --group-by-path src/db/mod.rs ● live all DB writes go through Repository, never raw sqlx ● live connection pool capped at 32; do not raise without ops sign-off src/api/handlers/checkout.rs ● live idempotency keyed on (user_id, request_id) — never timestamps
Useful when an agent is about to edit a file: h5i context relevant src/db/mod.rs surfaces
the per-file claims plus any reasoning trace mentioning the file, so the agent loads exactly
the constraints it needs and nothing else.
When to record claims
A heuristic that produces high-value claims with low overhead:
- After exploration that paid off. If you spent five reads to find a fact, record it.
- After a non-obvious negative result. "X is not in module Y" saves another agent the same wild-goose chase.
- After a constraint discovery. Limits, contracts, invariants — the things that don't show up in the type signature.
Don't record claims for things obvious from the file (function signatures, exported names). The agent can grep for those in O(1) anyway. The valuable claims are the ones that take five tool calls to arrive at.
Run the benchmark on your own repo
The harness is in the h5i repo under scripts/bench/claims_ab.sh. It picks a task from a
configurable list, runs N trials in each arm against your codebase, and writes a CSV.
Reproducing the numbers above on your own corpus is a 30-minute job:
$ h5i init $ h5i claims add "<a fact you verified once>" --path <evidence> $ scripts/bench/claims_ab.sh --task tasks/retry-jitter.md --n 10 arm=control trials=10 done arm=treatment trials=10 done results: bench/results-2026-05-06.csv
Whatever delta you measure on your own task is the delta you'll see in production. The magnitude depends on your task complexity, codebase size, and how aggressively your team records claims. The direction is robust.
Stop paying tokens to re-derive what the agent already knows
h5i claims are open source, evidence is a git hash, and there's no service to subscribe to.
Star on GitHub Back to docs