Benchmark · 2026-05-06

Cutting Claude API Token Costs 77% with Content-Addressed Claims

Prompt caching solves the cost of re-sending. It does not solve the cost of re-deriving. Here's the missing layer — and the A/B benchmark that shows what happens when you stop paying tokens for facts the agent already figured out.

Every Claude Code session that touches a non-trivial codebase begins with the same expensive ceremony: the agent re-discovers the codebase. It greps for the entry point. It reads the module that probably owns the logic. It opens three more files to verify. By the time it starts answering your actual question, you've burned 500,000 input tokens on a tour of files whose structure didn't change since yesterday.

Anthropic's prompt caching is the right tool for half this problem. It lets the agent re-send the same system prompt and tool definitions at a fraction of the cost. But it does nothing about the second half: the agent still has to look at the same files in the same order to arrive at the same conclusion. The cache makes that traversal cheaper per token. It does not make the traversal unnecessary.

h5i adds the missing layer. Claims are short, content-addressed facts the agent records once — "the retry logic lives in HttpClient::send, not the middleware" — pinned to a Merkle hash of the files they depend on. As long as those files don't change, the claim stays live. The next session reads it as a pre-verified fact and skips the entire re-derivation.

The A/B benchmark

The setup: a real Rust codebase, an identical task across both arms ("locate the retry logic and add jitter"), N = 10 trials per arm, claude-sonnet-4-6, no model temperature shenanigans. The control arm starts each session cold. The treatment arm starts with three pre-recorded claims pointing at the relevant files.

Metric Control (no claims) Treatment (with claims) Δ
Read tool calls5.6 ± 1.01.0 ± 0−82%
Cache-read tokens510,284117,433−77%
Assistant turns17.1 ± 1.84.8 ± 1.2−72%
Wall time52s ± 918s ± 5−65%
Task fidelity9/1010/10

The headline is the 77% reduction in cache-read tokens, which is the cost line that actually shows up on the Anthropic invoice. But the read-tool-calls number is the more interesting result: every treated trial read exactly one file (σ = 0). That's not a token-saving trick, that's a behavioral change. The agent stopped exploring because there was nothing left to explore.

Caveat. Claims help most for the second-and-later sessions on a stable codebase. A first session on a new repo still needs exploration — the claims aren't there yet. The effect compounds: the more sessions a project accumulates, the more claims pile up, the more sessions skip ahead. Greenfield projects see less benefit; long-running ones see more.

What a claim is, mechanically

A claim has three parts: a text, a list of evidence paths, and a Merkle hash:

~/my-project
$ h5i claims add "retry logic lives in HttpClient::send, not middleware" \
    --path src/http.rs --path src/middleware.rs
  Recorded claim 478be84c61e7
   evidence: sha256(("src/http.rs", <blob_oid>), ("src/middleware.rs", <blob_oid>))

The evidence hash is the SHA-256 of the (path, git-blob-OID) pairs at HEAD. It's stable under whitespace-only re-saves but changes the moment the file's content changes. There is no TTL, no heuristic, no "we think this is probably still right" — git tells you whether the evidence is intact, byte for byte.

Live, stale, and the auto-invalidation flow

A claim's status is computed from the current working tree. It's live if the evidence files hash the same as when the claim was recorded; stale otherwise.

~/my-project
$ h5i claims list

STATUS    ID            TEXT
● live    478be84c61e7  retry logic lives in HttpClient::send, not middleware
○ stale   9f02ab1e733c  FooError::Parse only constructed in parser.rs
          ↳  src/parser.rs changed — evidence no longer matches
● live    b40132d8e0c4  cache layer is single-tenant; no per-user keys

Stale claims are not deleted. They're held — the next agent session that reads the affected file gets a chance to re-confirm or replace the claim, often via a one-line check rather than a full re-derivation. Pruning is explicit:

~/my-project
$ h5i claims prune --stale-older-than 30d
  3 stale claims removed

Preamble injection

The other half of the mechanism is what happens at session start. Live claims render as a ## Known facts block in the system preamble:

SessionStart preamble
## Known facts (verified evidence at HEAD)

- retry logic lives in HttpClient::send, not middleware
  evidence: src/http.rs · src/middleware.rs

- cache layer is single-tenant; no per-user keys
  evidence: src/cache/mod.rs

# Treat these as ground truth. Re-verify only if a downstream
# change requires it.

The agent treats this block the way it treats CLAUDE.md instructions — high-trust context to use, not to re-derive. In practice, Claude reads it, accepts it, and skips the exploration phase entirely. The 1.0 ± 0 read-tool-call result above is the direct consequence: the model has nothing left to look up.

How claims compose with prompt caching

Anthropic prompt caching and h5i claims solve different parts of the cost equation, and they compose:

The 77% number above is reduction in cache-read tokens — meaning, even after caching is already doing its job, claims cut the remaining bill by another factor of ~4×. The two are multiplicative. If you're using one without the other, you're leaving cost on the table.

Single-path claims as per-file orientations

A claim with exactly one path doubles as a per-file briefing:

~/my-project
$ h5i claims add "all DB writes go through Repository, never raw sqlx" \
    --path src/db/mod.rs

$ h5i claims list --group-by-path

src/db/mod.rs
  ● live  all DB writes go through Repository, never raw sqlx
  ● live  connection pool capped at 32; do not raise without ops sign-off

src/api/handlers/checkout.rs
  ● live  idempotency keyed on (user_id, request_id) — never timestamps

Useful when an agent is about to edit a file: h5i context relevant src/db/mod.rs surfaces the per-file claims plus any reasoning trace mentioning the file, so the agent loads exactly the constraints it needs and nothing else.

When to record claims

A heuristic that produces high-value claims with low overhead:

Don't record claims for things obvious from the file (function signatures, exported names). The agent can grep for those in O(1) anyway. The valuable claims are the ones that take five tool calls to arrive at.

Run the benchmark on your own repo

The harness is in the h5i repo under scripts/bench/claims_ab.sh. It picks a task from a configurable list, runs N trials in each arm against your codebase, and writes a CSV. Reproducing the numbers above on your own corpus is a 30-minute job:

~/your-project
$ h5i init
$ h5i claims add "<a fact you verified once>" --path <evidence>
$ scripts/bench/claims_ab.sh --task tasks/retry-jitter.md --n 10
  arm=control     trials=10  done
  arm=treatment   trials=10  done
  results: bench/results-2026-05-06.csv

Whatever delta you measure on your own task is the delta you'll see in production. The magnitude depends on your task complexity, codebase size, and how aggressively your team records claims. The direction is robust.

Stop paying tokens to re-derive what the agent already knows

h5i claims are open source, evidence is a git hash, and there's no service to subscribe to.

Star on GitHub Back to docs