Benchmark · 2026-05-06

Cutting Claude Coding-Agent Costs 51% with Content-Addressed Claims

Prompt caching solves the cost of re-sending. It does not solve the cost of re-deriving. Here's the missing layer — and the A/B benchmark that shows what happens when you stop paying tokens for facts the agent already figured out.

By Koukyosyumei Reading time 9 min Tags Claude API · Cost · Prompt Caching

Every Claude Code session that touches a non-trivial codebase begins with the same expensive ceremony: the agent re-discovers the codebase. It greps for the entry point. It reads the module that probably owns the logic. It opens three more files to verify. By the time it starts answering your actual question, it may have spent a substantial part of the session touring files whose structure did not change since yesterday.

Anthropic's prompt caching is the right tool for half this problem. It lets the agent re-send the same system prompt and tool definitions at a fraction of the cost. But it does nothing about the second half: the agent still has to look at the same files in the same order to arrive at the same conclusion. The cache makes that traversal cheaper per token. It does not make the traversal unnecessary.

h5i adds the missing layer. Claims are short, content-addressed facts the agent records once — "the retry logic lives in HttpClient::send, not the middleware" — pinned to a Merkle hash of the files they depend on. As long as those files don't change, the claim stays live. The next session reads it as a pre-verified fact and skips the entire re-derivation.

The A/B benchmark

The checked-in experiment harness seeds the same synthetic 28-file Python service for both arms and gives claude-opus-4-7 the same HTTP-logging task. The control arm starts cold. The treatment arm starts with up to five claims curated by one claude-haiku-4-5 setup call. The published run contains N = 5 trials per arm.

Metric	Control (no claims)	Treatment (with claims)	Δ
Read tool calls	6.2 ± 0.4	4.0 ± 0	−35%
Cache-read tokens	793,425 ± 177,592	511,824 ± 106,863	−35%
Assistant turns	25.6 ± 3.8	16.6 ± 3.1	−35%
Wall time	68s ± 16	57s ± 5	−16%
Estimated session cost	~$4.35	~$2.13	−51%
Task fidelity	5/5	5/5	✓

The headline is the estimated 51% reduction in per-session cost, including the Haiku setup call. The read-tool-calls result is also useful: every claims-seeded trial read exactly four files, the four HTTP modules relevant to the task. The cold arm read six or seven files while confirming the boundary. See the full results, per-trial data, and caveats.

Caveat. Claims help most for the second-and-later sessions on a stable codebase. A first session on a new repo still needs exploration — the claims aren't there yet. The effect compounds: the more sessions a project accumulates, the more claims pile up, the more sessions skip ahead. Greenfield projects see less benefit; long-running ones see more.

What a claim is, mechanically

A claim has three parts: a text, a list of evidence paths, and a Merkle hash:

~/my-project

$ h5i claims add "retry logic lives in HttpClient::send, not middleware" \
    --path src/http.rs --path src/middleware.rs
✔  Recorded claim 478be84c61e7
   evidence: sha256(("src/http.rs", <blob_oid>), ("src/middleware.rs", <blob_oid>))

The evidence hash is the SHA-256 of the (path, git-blob-OID) pairs at HEAD. It's stable under whitespace-only re-saves but changes the moment the file's content changes. There is no TTL, no heuristic, no "we think this is probably still right" — git tells you whether the evidence is intact, byte for byte.

Live, stale, and the auto-invalidation flow

A claim's status is computed from the current working tree. It's live if the evidence files hash the same as when the claim was recorded; stale otherwise.

~/my-project

$ h5i claims list

STATUS    ID            TEXT
● live    478be84c61e7  retry logic lives in HttpClient::send, not middleware
○ stale   9f02ab1e733c  FooError::Parse only constructed in parser.rs
          ↳  src/parser.rs changed — evidence no longer matches
● live    b40132d8e0c4  cache layer is single-tenant; no per-user keys

Stale claims are not deleted. They're held — the next agent session that reads the affected file gets a chance to re-confirm or replace the claim, often via a one-line check rather than a full re-derivation. Pruning is explicit:

~/my-project

$ h5i claims prune --stale-older-than 30d
  3 stale claims removed

Preamble injection

The other half of the mechanism is what happens at session start. Live claims render as a ## Known facts block in the system preamble:

SessionStart preamble

## Known facts (verified evidence at HEAD)

- retry logic lives in HttpClient::send, not middleware
  evidence: src/http.rs · src/middleware.rs

- cache layer is single-tenant; no per-user keys
  evidence: src/cache/mod.rs

# Treat these as ground truth. Re-verify only if a downstream
# change requires it.

The agent treats this block the way it treats CLAUDE.md instructions — high-trust context to use, not to re-derive. In practice, Claude reads it, accepts it, and skips the exploration phase selectively. The 4.0 ± 0 read-tool-call result above is the direct consequence: the model reads the relevant modules without exploring decoy files.

How claims compose with prompt caching

Anthropic prompt caching and h5i claims solve different parts of the cost equation, and they compose:

Caching reduces the per-token cost of system-prompt re-sends.
Claims reduce the number of tokens the agent needs to read in order to answer.

In the checked-in run, claims reduced cache-read tokens by 35.5% and estimated total per-session cost by 51%, including the Haiku setup call. The exact magnitude depends on the task and codebase; the mechanism complements prompt caching by reducing re-derivation work.

Single-path claims as per-file orientations

A claim with exactly one path doubles as a per-file briefing:

~/my-project

$ h5i claims add "all DB writes go through Repository, never raw sqlx" \
    --path src/db/mod.rs

$ h5i claims list --group-by-path

src/db/mod.rs
  ● live  all DB writes go through Repository, never raw sqlx
  ● live  connection pool capped at 32; do not raise without ops sign-off

src/api/handlers/checkout.rs
  ● live  idempotency keyed on (user_id, request_id) — never timestamps

Useful when an agent is about to edit a file: h5i context relevant src/db/mod.rs surfaces the per-file claims plus any reasoning trace mentioning the file, so the agent loads exactly the constraints it needs and nothing else.

When to record claims

A heuristic that produces high-value claims with low overhead:

After exploration that paid off. If you spent five reads to find a fact, record it.
After a non-obvious negative result. "X is not in module Y" saves another agent the same wild-goose chase.
After a constraint discovery. Limits, contracts, invariants — the things that don't show up in the type signature.

Don't record claims for things obvious from the file (function signatures, exported names). The agent can grep for those in O(1) anyway. The valuable claims are the ones that take five tool calls to arrive at.

Run the benchmark on your own repo

The reproducible harness is scripts/experiment_claims.sh. It seeds its test project, runs both arms, checks fidelity, and writes raw result records:

~/your-project

$ cargo build
$ H5I_BIN=$PWD/target/debug/h5i N_TRIALS=5 ./scripts/experiment_claims.sh
  CONTROL vs AUTO_HAIKU_CLAIMS
  raw records: /tmp/h5i-claims-exp-*-results.jsonl.filtered

Treat this as a reproducible directional result, not a universal constant. The magnitude depends on task complexity, codebase size, model behavior, and claim quality.

Giving Claude Code Persistent Memory Across Sessions

Claims solve re-derivation cost; persistent context solves "where did I leave off." Different problems, same workflow.

Stop paying tokens to re-derive what the agent already knows

h5i claims are open source, evidence is a git hash, and there's no service to subscribe to.

Star on GitHub Back to docs