Sandboxing Autonomous Agents

When your agent executes code, the output of that code runs on a real machine. The model doesn't know that. It generates subprocess.run(["rm", "-rf", "/"]) the same way it generates print("hello") — they're both just tokens. The difference between them isn't in the model. It's in whether you let either of them actually run.

Sandboxing is the architectural answer to that problem. It means: the agent's code runs in an isolated environment where even the worst possible output can't escape to hurt the host system or other users.

Real-World Use Cases

Code interpreter agents (think: AI that writes and runs Python on your behalf) need full isolation — the user's code is untrusted by definition
Research agents that scrape the web, parse PDFs, and process arbitrary file formats need containment in case a malicious file exploits a parser vulnerability
Customer-facing agents where multiple users share infrastructure need isolation between sessions — one user's injected code can't affect another's environment
DevOps agents that run shell commands need strict limits on what commands can even be attempted, let alone succeed

Key Terms

Namespace — a Linux kernel feature that isolates process views of resources (filesystem, network, process IDs). Docker uses namespaces extensively.

cgroup (control group) — limits how much CPU, memory, and I/O a process can consume. The enforcement mechanism for resource quotas.

Syscall (system call) — how a process asks the OS kernel to do something: read a file, open a socket, allocate memory. The kernel is the gatekeeper. If you control which syscalls are allowed, you control what the process can do.

microVM — a minimal virtual machine. Runs its own kernel, isolated at the hardware virtualization layer. Much lighter than a full VM, heavier than a container.

Cold start — the time from "start the sandbox" to "code can run." Critical for interactive agent workflows.

The Isolation Stack

There's a clear hierarchy of isolation technologies, from fastest-but-weakest to slowest-but-strongest:

Docker  <  gVisor  <  Firecracker / Kata microVMs
(weakest)   (mid)          (strongest)

Each step up the stack adds stronger isolation and slightly higher overhead. The right choice depends on how much you trust what's running inside.

Docker — Fast, Good Enough for Trusted Code

Docker uses Linux namespaces and cgroups to isolate containers. Processes inside can't see each other's filesystems or process trees. Resource consumption is capped.

The critical limitation: all containers on the same host share the host kernel. There is no separate kernel per container. If a piece of code inside the container exploits a kernel vulnerability, it can potentially escape to the host.

This is acceptable when you trust the code. It's not acceptable when the code was generated by an LLM based on instructions from an untrusted user.

Python

import docker

client = docker.from_env()

def run_code_in_docker(code: str, timeout: int = 10) -> str:
    container = client.containers.run(
        image="python:3.12-slim",
        command=["python", "-c", code],
        mem_limit="128m",        # hard memory cap
        cpu_quota=50000,         # 50% of one CPU core
        network_disabled=True,   # no network access
        read_only=True,          # read-only filesystem
        remove=True,             # auto-delete after run
        timeout=timeout,
    )
    return container.decode("utf-8")

Use Docker when: you control the code being run, cold start speed matters, or you're running in a trusted internal environment.

gVisor — Better Isolation, Moderate Overhead

gVisor intercepts syscalls in user space before they reach the host kernel. It implements a synthetic kernel layer: the sandboxed process thinks it's talking to a real kernel, but it's actually talking to gVisor's user-space implementation, which then decides what to pass through.

The result: the host kernel never directly receives syscalls from sandboxed code. A kernel exploit inside the sandbox fails because the real kernel never sees it.

gVisor is used in production at Modal and Northflank. It's a reasonable middle ground — meaningfully better than Docker's isolation, without the full overhead of booting a separate kernel per execution.

The trade-off: not all syscalls are implemented. Some programs that rely on obscure kernel interfaces break inside gVisor. You need to test your specific workload.

Use gVisor when: code is "probably fine" but not fully trusted, or you need better-than-Docker isolation without the complexity of microVMs.

Firecracker microVMs — Maximum Isolation

Firecracker is the strongest isolation available for production use. It's the technology behind AWS Lambda — each invocation gets its own dedicated Linux kernel running in a hardware-virtualized microVM.

The key property: kernel exploits inside one microVM cannot reach other microVMs or the host, because the isolation boundary is hardware virtualization, not OS-level namespacing. There's no shared kernel to exploit.

E2B and Sprites.dev are the managed platforms built on Firecracker for agent workloads specifically. They handle the infrastructure complexity and expose a clean SDK.

Python

from e2b_code_interpreter import Sandbox

async def run_code_isolated(code: str) -> dict:
    async with Sandbox() as sandbox:
        # Each Sandbox() call spins up a fresh Firecracker microVM
        # ~150ms cold start, full Linux kernel, complete isolation
        execution = await sandbox.run_code(code)
        return {
            "stdout": execution.logs.stdout,
            "stderr": execution.logs.stderr,
            "error": execution.error,
            "results": execution.results,
        }

# The microVM is automatically destroyed when the context manager exits
# Nothing that ran inside can persist to the host

E2B sessions also support pause and resume — you can snapshot a microVM's state mid-session, pause it, and resume later. Useful for long-running agent workflows that span multiple interactions.

Use Firecracker when: executing untrusted LLM-generated code, operating in compliance environments (SOC2, HIPAA), building multi-tenant systems where user isolation is a hard requirement, or when a security breach would be catastrophic.

Platform Comparison

Platform	Isolation	Cold Start	Session Limit	Best For
E2B	Firecracker microVM	~150ms	24hr (Pro)	Untrusted code, compliance
Daytona	Docker (Kata optional)	<90ms	Long-running	Persistent workspaces, speed
Sprites.dev	Firecracker microVM	Fast	Unlimited + checkpoint	Stateful long sessions
Modal	gVisor	Fast	Per-invocation	Serverless Python, scale
Self-hosted Docker	Docker	<50ms	Unlimited	Internal trusted workloads

The right answer isn't always the strongest isolation. A research agent running in a controlled internal environment doesn't need Firecracker. A public-facing code execution product absolutely does.

Rate Limiting and Circuit Breakers

Sandboxing isolates the blast radius of bad code. Rate limiting and circuit breakers control the volume of agent actions — preventing runaway loops, cost explosions, and denial-of-wallet attacks.

The circuit breaker pattern is borrowed from electrical engineering: if the system starts failing repeatedly, trip a breaker and stop execution rather than letting it spiral.

Python

import time
from collections import deque
from threading import Lock

class AgentRateLimiter:
    def __init__(
        self,
        max_actions_per_minute: int = 20,
        max_spend_per_hour_usd: float = 1.00,
        circuit_breaker_threshold: int = 5,  # consecutive failures before open
    ):
        self.max_actions = max_actions_per_minute
        self.max_spend = max_spend_per_hour_usd
        self.cb_threshold = circuit_breaker_threshold
        self._action_times: deque = deque()
        self._spend_log: list = []
        self._consecutive_failures = 0
        self._lock = Lock()

    def check_action(self) -> tuple[bool, str]:
        with self._lock:
            # Circuit breaker: too many consecutive failures → stop everything
            if self._consecutive_failures >= self.cb_threshold:
                return False, f"Circuit breaker open: {self._consecutive_failures} failures"

            # Rate check: prune old entries, check current window
            cutoff = time.time() - 60
            while self._action_times and self._action_times[0] < cutoff:
                self._action_times.popleft()
            if len(self._action_times) >= self.max_actions:
                return False, f"Rate limit: {self.max_actions} actions/minute exceeded"

            self._action_times.append(time.time())
            return True, "ok"

    def check_spend(self, estimated_cost_usd: float) -> tuple[bool, str]:
        with self._lock:
            cutoff = time.time() - 3600
            self._spend_log = [(t, c) for t, c in self._spend_log if t > cutoff]
            total = sum(c for _, c in self._spend_log)
            if total + estimated_cost_usd > self.max_spend:
                return False, f"Spend limit: ${self.max_spend}/hr exceeded (current: ${total:.4f})"
            self._spend_log.append((time.time(), estimated_cost_usd))
            return True, "ok"

    def record_failure(self):
        with self._lock:
            self._consecutive_failures += 1

    def record_success(self):
        with self._lock:
            self._consecutive_failures = 0

Wrap every significant agent action with check_action() before it runs. If it returns False, stop and surface the reason. The circuit breaker catches runaway loops that a simple rate limit won't — an agent that fails 5 times in a row is probably broken, not just slow.

The Three-Layer Security Architecture

Sandboxing, rate limiting, and guardrails don't exist in isolation — they're part of a coordinated security architecture. In production, think of it as three layers:

Layer 1: Policy — what is allowed in principle

Data classification tiers (what data can this agent touch?)
Decision boundary rules (what categories of action are permitted?)
Compliance checkpoints (GDPR, HIPAA, SOC2 if relevant)

Layer 2: Configuration — enforced at startup, before any code runs

IAM / RBAC role binding (what credentials does the agent get?)
Prompt filtering rules (what topics are in-scope?)
Sandboxed execution environment (which isolation technology?)
Model registry (only approved models can be invoked)

Layer 3: Runtime — enforced continuously during execution

Rate limiters and circuit breakers
Continuous anomaly detection (unusual action patterns)
Immutable audit trail (every action logged, tamper-evident)
Kill switch / automated incident isolation

The failure mode to avoid: putting all your security at Layer 3. Runtime monitoring catches problems as they happen, but by then the agent may have already taken irreversible actions. Layers 1 and 2 are where you prevent the conditions for those actions to exist at all.

Choosing Your Sandbox

A simple decision tree:

The code was written by you or your team → Docker is fine
The code was written by an LLM but the user is internal/trusted → Docker with tight resource limits, or gVisor
The code was generated by an LLM for an untrusted external user → Firecracker (E2B, Sprites.dev)
You're building a multi-tenant product where users share infrastructure → Firecracker, mandatory
You have compliance requirements (SOC2, HIPAA) → Firecracker, E2B specifically has these certifications

No sandbox is a substitute for least-privilege tool design. An agent that has no file delete tool can't delete files — regardless of what isolation technology surrounds it. Sandboxing is the last line of defense, not the first.

Module 5.5

Real-World Use Cases

Key Terms

The Isolation Stack

Docker — Fast, Good Enough for Trusted Code

gVisor — Better Isolation, Moderate Overhead

Firecracker microVMs — Maximum Isolation

Platform Comparison

Rate Limiting and Circuit Breakers

The Three-Layer Security Architecture

Choosing Your Sandbox

Sources