One Schema for Every Tool: Structured Output for AI Agents
Filtering shrinks tool output. It doesn't make it machine-actionable: the agent still has to parse a different free-text shape for pytest, cargo, tsc, eslint, mypy… h5i adds a unified JSON/YAML result schema on top of reduction — one shape across every tool — so the agent can branch on a status, iterate findings, dedupe by fingerprint, and query captures. That schema is what sets h5i apart from text-only reducers.
Token reduction is a real win (see the object-store post): keep the raw output out-of-band, hand the agent a small summary. Tools like rtk and headroom (both Apache-2.0, and the prior art h5i's text filters build on) do this well. But a filtered summary is still text — and every tool's text is shaped differently. The agent that wants to know "did it pass, and if not, which test, on what line?" must re-learn pytest's layout, then cargo's, then tsc's. There's nothing to branch on, nothing to dedupe, nothing to query.
The idea: one result, every tool
h5i parses each tool into a single typed ToolResult. The key realization is that a
failing test, a compile error, and a lint diagnostic are the same shape: a
thing, somewhere, with a message and a severity. So they all become a unified
Finding, under one envelope:
tool: pytest
kind: test # test | lint | typecheck | build | vcs | generic
status: failed # passed | ok | failed | error | unknown
exit_code: 1
counts: { failed: 1, passed: 120 }
parser_confidence: parsed # parsed | heuristic | generic
raw_oid: sha256:934f… # the full output, always recoverable
findings:
- kind: test_failure # test_failure | diagnostic | build_error | panic | generic
severity: failure
id: tests/test_auth.py::test_refresh
message: assert 0 == 100
location: tests/test_auth.py:42
fingerprint: 0bb827e4e61a # stable across line shifts → dedupe / track
Swap pytest for cargo test, tsc, eslint,
ruff, mypy, or go test and you get the
same fields — a `rule` for a linter, `expected`/`actual` for an assertion, a
`build_error` kind for a compile failure. An agent learns the schema once.
Three renders, one source of truth
The typed result drives every output format, so they never drift:
| Format | Shape | For |
|---|---|---|
compact (default) | one line per finding | token-minimal agent reading |
structured | full YAML | inspection |
json | canonical JSON | programmatic / the h5i_capture_run MCP tool |
JSON is canonical — it's what's stored in the git-tracked manifest and what the MCP tool returns — and the compact text and YAML are renders of the same struct. So a capture is a record, not a one-shot log line.
What the schema buys an agent
Because every capture is the same typed object, an agent (or a human) can act on it:
- Branch on a field.
status: failedis a boolean decision, not a regex over prose. And it's honest —statusis derived from the exit code, never guessed from text, so a passing-looking log on a nonzero exit is stillfailed. - Iterate findings. Each carries a precise
locationandmessage— jump straight totests/test_auth.py:42. - Dedupe / track by fingerprint. Each finding has a stable
fingerprint(a hash of tool + rule + digit-normalized location + message), so "the same failure" is recognizable across runs even as line numbers shift. - Query the store. The result lives in the manifest, so
h5i recall objects --status failed --tool pytestworks across every capture. - Know how much to trust it.
parser_confidencesays whether the result wasparsedby a dedicated adapter or is agenericfallback.
$ h5i capture run --format json -- pytest -q | jq '.status, .findings[0].location' "failed" "tests/test_auth.py:42" $ h5i recall objects --status failed --tool mypy # every mypy failure, ever
Honest by construction
A schema that lies is worse than text. Two rules keep it honest. First, status comes
from the exit code, never from scraping words like "passed". Second, a parser that
can't find its anchors declines to a generic result (status from
the exit code, the reduced text in body) rather than inventing structure — and
says so via parser_confidence: generic. The raw bytes are always one
h5i recall object away, so the structure is a view, never the only copy.
Where this sits relative to rtk / headroom
Credit where it's due: rtk's declarative per-command filters and headroom's log line-folding
are excellent at the reduction problem, and h5i reuses both for its text path (see the
NOTICE). The unified
ToolResult / Finding schema — typed, fingerprinted, queryable, honest
about confidence, and identical across tools — is the layer h5i adds on top. Reduction makes
the output small; the schema makes it actionable.
Coverage
Dedicated parsers (rich findings) ship for pytest, cargo test, go test, tsc, eslint,
ruff, and mypy; every other command gets a valid generic result (honest status +
reduced body). Each parser carries golden tests so the schema stays faithful, and
adding a tool is just another parser feeding the same shape.
Give your agent one shape for every tool
h5i is open source — the schema, the parsers, and the object store are all in the repo, no service to subscribe to.
Star on GitHub Back to docs