Sandbox Series · Part 3 · 2026-06-12

Sandboxing AI Agents, Part 3: How Popular Sandboxes Differ

The sandbox landscape is not one ladder. Tools optimize for different axes: edit isolation, execution isolation, cloud scale, network control, provenance, and review workflow.

By Koukyosyumei Reading time 17 min Tags Comparison · Containers · microVMs

This comparison is current as of June 12, 2026 and is intentionally architectural. Specific flags, SDK names, and product limits change. The important question is more stable: where is the enforcement boundary, and does the tool also solve the agent workflow around that boundary?

Series map. This comparison builds on the AI agent sandbox threat model and implementation guide; finish with h5i's env design.

AI sandbox comparison dimensions

Agent sandboxes differ on at least six dimensions. Isolation boundary asks whether the workload shares the host kernel, runs behind a user-space kernel, or runs in a microVM. Workspace model asks where code changes live. Network policy asks whether egress is unrestricted, proxy-mediated, packet-filtered, or denied. Provenance asks what evidence survives. Review lifecycle asks how a human accepts or rejects the change. Footprint asks whether the tool is local, daemon-based, Kubernetes-based, or managed cloud.

Tool family	Primary boundary	Best at	Main caution
Git worktree	checkout and branch only	parallel edits	no execution sandbox
container-use	container plus worktree	parallel coding-agent workflow	container boundary and runtime configuration matter
Anthropic sandbox-runtime	native OS process sandbox	lightweight local command confinement	not a Git review/provenance system
OpenSandbox	Docker/Kubernetes sandbox platform	unified APIs and runtime backends	operational control plane complexity
E2B	managed cloud sandbox, Firecracker-backed	hosted code execution for agents	remote service and SDK lifecycle
gVisor	user-space Linux-like kernel	reducing host-kernel syscall exposure	compatibility and performance tradeoffs
Kata Containers	lightweight VM per container/pod	VM-strength container runtime	requires VM-capable infrastructure
Firecracker	KVM microVM	strong isolation substrate	not an agent workflow by itself
h5i env	tiered worktree + process/supervised/container	local reviewable agent work with provenance	no shipped microVM tier today

Git worktree

A Git worktree is not usually marketed as a sandbox, but it is the base layer for many coding-agent systems. It gives each agent a separate directory, branch, index, and working state while sharing the repository object store. It is excellent for avoiding edit collisions and making experiments cheap.

Its security claim is almost zero. A process launched in a worktree still has the user's host permissions. It can read the home directory, use the network, inspect processes, and modify any writable path unless something else confines it. Treat a worktree as workspace isolation, not execution isolation.

container-use

Dagger's container-use and the Zed integration described in Zed's background-agents post combine containers with Git worktrees. That is a natural shape for coding agents: each agent gets isolated execution and a separate branch that a human can inspect, merge, or discard.

The key strength is workflow. You can run parallel agents without stashing, cloning repeatedly, or letting them overwrite the same checkout. The caution is that the isolation boundary is a container boundary. Containers share the host kernel unless paired with stronger runtimes. Exposed Docker sockets, privileged mode, broad bind mounts, shared PID namespaces, and retained capabilities can collapse the security claim.

Anthropic sandbox-runtime

Anthropic's sandbox-runtime is a lightweight tool for filesystem and network restrictions around arbitrary processes. Anthropic's engineering writeup presents it as a way to run Claude Code bash commands, agents, local MCP servers, and other processes with defined directory and network access, without spinning up a container.

Architecturally, this sits in the process-sandbox family. It is attractive when you want low startup cost and local enforcement using native OS primitives. It is not, by itself, a branch lifecycle, review evidence, or Git provenance system. If your problem is "confine this command", it is close to the core. If your problem is "run five coding agents, compare their diffs, keep audit evidence, and merge one", you need additional workflow.

OpenSandbox

OpenSandbox positions itself as a general sandbox platform for AI applications, with SDKs and Docker/Kubernetes runtimes for code execution, GUI agents, evaluations, and training. This is a platform shape rather than a single-process wrapper. It cares about lifecycle APIs, runtime backends, and production integration.

The advantage is breadth. A platform can standardize create, execute, file, network, and lifecycle operations across backends. The tradeoff is footprint: Docker or Kubernetes infrastructure, runtime configuration, and integration complexity. Its center of gravity is a sandbox service for applications; Git-native review and reasoning provenance are outside the core model.

E2B

E2B provides isolated sandboxes for agents through SDKs, and E2B's public material describes sandboxes as Firecracker-powered microVM environments. This is the hosted API approach: developers ask for a sandbox, run code, manage files, and let the provider operate the isolation substrate.

The strength is time to integration and an isolation ceiling higher than ordinary shared-kernel containers. The tradeoff is that the environment is a remote service with account, SDK, lifecycle, and data-placement considerations. It is a strong fit for products that need code-interpreter-style execution. It is less directly a local Git workflow unless you build that layer around it.

gVisor and Kata

gVisor is not an agent sandbox product; it is a container isolation runtime. It implements a Linux-like interface in a user-space application kernel, reducing how much of the host kernel a workload reaches directly. Kata Containers uses lightweight virtual machines that feel like containers but add hardware virtualization as a second boundary.

These are important because they can sit under higher-level agent platforms. If a tool says "container", ask which runtime. A runc container, a gVisor sandbox, and a Kata VM-backed container are not the same security claim. Compatibility, startup time, kernel feature coverage, observability, and infrastructure support differ substantially.

Firecracker

Firecracker is a microVM monitor developed for serverless-style workloads. Its security value is category-level: the workload runs behind a guest kernel and KVM boundary, not merely inside host namespaces. This is why microVM-backed systems are the usual answer when the workload may be hostile rather than merely risky.

Firecracker is a substrate, not a full agent workflow. You still need root filesystems, networking, file transfer, snapshotting, policy, logging, identity, merge workflow, and cleanup. Teams that need the top isolation class should be willing to build or adopt that platform layer.

h5i env

h5i's env feature sits in a different part of the design space: local Git-native agent work where isolation, provenance, and review are one unit. It creates a worktree-backed environment, resolves a sandbox policy, records command captures and policy evidence, and lets a reviewer inspect or apply the resulting diff. Its tiers range from workspace isolation to rootless process confinement, supervised egress control, and a rootless Podman container backend.

The key distinction is not that h5i has the strongest raw boundary. It does not ship a microVM tier today. The distinction is that the sandbox is tied to the code branch, reasoning/context branch, policy digest, captures, denials, and review lifecycle. h5i is trying to make sandboxed work auditable in Git, not just executable somewhere else.

Boundary strength

A rough isolation ordering looks like this, with many configuration caveats:

not a total order, but a useful map

worktree only
  -> process sandbox with namespaces/seccomp/Landlock
  -> hardened container runtime
  -> user-space kernel such as gVisor
  -> lightweight VM runtime such as Kata
  -> microVM substrate such as Firecracker

This order is about escape resistance, not developer experience. The strongest boundary may be the wrong tool for a quick local refactor. The lightest tool may be irresponsible for hostile code. Choose based on the adversary and the operational need.

Network control is the differentiator

Most tools can claim some filesystem isolation. Network policy separates mature sandboxes from directory wrappers. Ask whether egress is unrestricted, denied, proxy-filtered, DNS-filtered, packet-filtered, or VM-network-filtered. Then ask what happens with raw IP literals, DNS rebinding, IPv6, Unix sockets, package-manager mirrors, and tools that ignore HTTP proxy variables.

The strongest practical designs combine name policy with packet enforcement. The weakest designs rely on a proxy while leaving direct sockets available. That can still be useful for cooperative workloads, but it should not be described as an un-bypassable egress allowlist.

Provenance is the missing column

The agent-sandbox landscape is strong on execution substrates and weaker on durable review evidence. A human reviewer needs to know not only that a command was isolated, but what branch it changed, what prompt or context caused the work, what command output mattered, what policy was enforced, which denied operations occurred, and why this diff is safe to merge.

That is why a Git-native provenance layer is not a luxury. Without it, the sandbox is a safe place to do work but not necessarily a good place to review work. The best agent systems will combine a clear boundary with a replayable audit trail.

How to choose

For parallel trusted edits: use Git worktrees.
For local commands that need quick confinement: use a process sandbox.
For multi-agent coding with familiar dev environments: use container-plus-worktree systems.
For hosted code execution in an application: use an API sandbox such as E2B or a platform such as OpenSandbox.
For hostile code or multi-tenant workloads: prefer gVisor, Kata, or microVM-backed isolation.
For local agent work where audit and merge evidence matter: use a Git-native environment model such as h5i env.

Part 4

How h5i Implements Sandbox Environments

The h5i design in detail: tiers, rootless egress, mediated commits, secrets, capture logs, and boundary-pressure review.

The right sandbox depends on the claim

h5i is strongest where isolated execution must become reviewable, shareable Git evidence.

Star on GitHub Read part 2