Sandboxing AI Agents, Part 3: How Popular Sandboxes Differ
The sandbox landscape is not one ladder. Tools optimize for different axes: edit isolation, execution isolation, cloud scale, network control, provenance, and review workflow.
This comparison is current as of June 12, 2026 and is intentionally architectural. Specific flags, SDK names, and product limits change. The important question is more stable: where is the enforcement boundary, and does the tool also solve the agent workflow around that boundary?
AI sandbox comparison dimensions
Agent sandboxes differ on at least six dimensions. Isolation boundary asks whether the workload shares the host kernel, runs behind a user-space kernel, or runs in a microVM. Workspace model asks where code changes live. Network policy asks whether egress is unrestricted, proxy-mediated, packet-filtered, or denied. Provenance asks what evidence survives. Review lifecycle asks how a human accepts or rejects the change. Footprint asks whether the tool is local, daemon-based, Kubernetes-based, or managed cloud.
| Tool family | Primary boundary | Best at | Main caution |
|---|---|---|---|
| Git worktree | checkout and branch only | parallel edits | no execution sandbox |
| container-use | container plus worktree | parallel coding-agent workflow | container boundary and runtime configuration matter |
| Anthropic sandbox-runtime | native OS process sandbox | lightweight local command confinement | not a Git review/provenance system |
| OpenSandbox | Docker/Kubernetes sandbox platform | unified APIs and runtime backends | operational control plane complexity |
| E2B | managed cloud sandbox, Firecracker-backed | hosted code execution for agents | remote service and SDK lifecycle |
| gVisor | user-space Linux-like kernel | reducing host-kernel syscall exposure | compatibility and performance tradeoffs |
| Kata Containers | lightweight VM per container/pod | VM-strength container runtime | requires VM-capable infrastructure |
| Firecracker | KVM microVM | strong isolation substrate | not an agent workflow by itself |
| h5i env | tiered worktree + process/supervised/container | local reviewable agent work with provenance | no shipped microVM tier today |
Git worktree
A Git worktree is not usually marketed as a sandbox, but it is the base layer for many coding-agent systems. It gives each agent a separate directory, branch, index, and working state while sharing the repository object store. It is excellent for avoiding edit collisions and making experiments cheap.
Its security claim is almost zero. A process launched in a worktree still has the user's host permissions. It can read the home directory, use the network, inspect processes, and modify any writable path unless something else confines it. Treat a worktree as workspace isolation, not execution isolation.
container-use
Dagger's container-use and the Zed integration described in Zed's background-agents post combine containers with Git worktrees. That is a natural shape for coding agents: each agent gets isolated execution and a separate branch that a human can inspect, merge, or discard.
The key strength is workflow. You can run parallel agents without stashing, cloning repeatedly, or letting them overwrite the same checkout. The caution is that the isolation boundary is a container boundary. Containers share the host kernel unless paired with stronger runtimes. Exposed Docker sockets, privileged mode, broad bind mounts, shared PID namespaces, and retained capabilities can collapse the security claim.
Anthropic sandbox-runtime
Anthropic's sandbox-runtime is a lightweight tool for filesystem and network restrictions around arbitrary processes. Anthropic's engineering writeup presents it as a way to run Claude Code bash commands, agents, local MCP servers, and other processes with defined directory and network access, without spinning up a container.
Architecturally, this sits in the process-sandbox family. It is attractive when you want low startup cost and local enforcement using native OS primitives. It is not, by itself, a branch lifecycle, review evidence, or Git provenance system. If your problem is "confine this command", it is close to the core. If your problem is "run five coding agents, compare their diffs, keep audit evidence, and merge one", you need additional workflow.
OpenSandbox
OpenSandbox positions itself as a general sandbox platform for AI applications, with SDKs and Docker/Kubernetes runtimes for code execution, GUI agents, evaluations, and training. This is a platform shape rather than a single-process wrapper. It cares about lifecycle APIs, runtime backends, and production integration.
The advantage is breadth. A platform can standardize create, execute, file, network, and lifecycle operations across backends. The tradeoff is footprint: Docker or Kubernetes infrastructure, runtime configuration, and integration complexity. Its center of gravity is a sandbox service for applications; Git-native review and reasoning provenance are outside the core model.
E2B
E2B provides isolated sandboxes for agents through SDKs, and E2B's public material describes sandboxes as Firecracker-powered microVM environments. This is the hosted API approach: developers ask for a sandbox, run code, manage files, and let the provider operate the isolation substrate.
The strength is time to integration and an isolation ceiling higher than ordinary shared-kernel containers. The tradeoff is that the environment is a remote service with account, SDK, lifecycle, and data-placement considerations. It is a strong fit for products that need code-interpreter-style execution. It is less directly a local Git workflow unless you build that layer around it.
gVisor and Kata
gVisor is not an agent sandbox product; it is a container isolation runtime. It implements a Linux-like interface in a user-space application kernel, reducing how much of the host kernel a workload reaches directly. Kata Containers uses lightweight virtual machines that feel like containers but add hardware virtualization as a second boundary.
These are important because they can sit under higher-level agent platforms. If a tool says "container", ask which runtime. A runc container, a gVisor sandbox, and a Kata VM-backed container are not the same security claim. Compatibility, startup time, kernel feature coverage, observability, and infrastructure support differ substantially.
Firecracker
Firecracker is a microVM monitor developed for serverless-style workloads. Its security value is category-level: the workload runs behind a guest kernel and KVM boundary, not merely inside host namespaces. This is why microVM-backed systems are the usual answer when the workload may be hostile rather than merely risky.
Firecracker is a substrate, not a full agent workflow. You still need root filesystems, networking, file transfer, snapshotting, policy, logging, identity, merge workflow, and cleanup. Teams that need the top isolation class should be willing to build or adopt that platform layer.
h5i env
h5i's env feature sits in a different part of the design space: local Git-native agent work where isolation, provenance, and review are one unit. It creates a worktree-backed environment, resolves a sandbox policy, records command captures and policy evidence, and lets a reviewer inspect or apply the resulting diff. Its tiers range from workspace isolation to rootless process confinement, supervised egress control, and a rootless Podman container backend.
The key distinction is not that h5i has the strongest raw boundary. It does not ship a microVM tier today. The distinction is that the sandbox is tied to the code branch, reasoning/context branch, policy digest, captures, denials, and review lifecycle. h5i is trying to make sandboxed work auditable in Git, not just executable somewhere else.
Boundary strength
A rough isolation ordering looks like this, with many configuration caveats:
worktree only -> process sandbox with namespaces/seccomp/Landlock -> hardened container runtime -> user-space kernel such as gVisor -> lightweight VM runtime such as Kata -> microVM substrate such as Firecracker
This order is about escape resistance, not developer experience. The strongest boundary may be the wrong tool for a quick local refactor. The lightest tool may be irresponsible for hostile code. Choose based on the adversary and the operational need.
Network control is the differentiator
Most tools can claim some filesystem isolation. Network policy separates mature sandboxes from directory wrappers. Ask whether egress is unrestricted, denied, proxy-filtered, DNS-filtered, packet-filtered, or VM-network-filtered. Then ask what happens with raw IP literals, DNS rebinding, IPv6, Unix sockets, package-manager mirrors, and tools that ignore HTTP proxy variables.
The strongest practical designs combine name policy with packet enforcement. The weakest designs rely on a proxy while leaving direct sockets available. That can still be useful for cooperative workloads, but it should not be described as an un-bypassable egress allowlist.
Provenance is the missing column
The agent-sandbox landscape is strong on execution substrates and weaker on durable review evidence. A human reviewer needs to know not only that a command was isolated, but what branch it changed, what prompt or context caused the work, what command output mattered, what policy was enforced, which denied operations occurred, and why this diff is safe to merge.
That is why a Git-native provenance layer is not a luxury. Without it, the sandbox is a safe place to do work but not necessarily a good place to review work. The best agent systems will combine a clear boundary with a replayable audit trail.
How to choose
- For parallel trusted edits: use Git worktrees.
- For local commands that need quick confinement: use a process sandbox.
- For multi-agent coding with familiar dev environments: use container-plus-worktree systems.
- For hosted code execution in an application: use an API sandbox such as E2B or a platform such as OpenSandbox.
- For hostile code or multi-tenant workloads: prefer gVisor, Kata, or microVM-backed isolation.
- For local agent work where audit and merge evidence matter: use a Git-native environment model such as h5i env.
The right sandbox depends on the claim
h5i is strongest where isolated execution must become reviewable, shareable Git evidence.
Star on GitHub Read part 2