Agent Sandboxing Boosts Security

Coding agents are becoming adept at executing terminal commands, but this power comes with significant risk. Unsupervised agents can corrupt data, deploy faulty code, or expose sensitive information. While human approval gates these actions, approval fatigue renders this safeguard ineffective over time.

To address this, a secure agent sandbox has been rolled out across macOS, Linux, and Windows. This controlled environment allows agents to operate freely, only prompting for approval when they need to perform actions outside its boundaries, such as accessing the internet. This reduces interruptions by 40%, saving users considerable time.

Sandbox Goals

The primary objective was to eliminate interruptions while enhancing security. The aim is to grant agents sufficient operational freedom without exposing systems to undue risk. Striking this balance is challenging, as many development tasks require elevated privileges.

A usable sandbox necessitates navigating trade-offs between security and functionality, respecting each operating system's limitations. The implementation provides a uniform API across platforms, leveraging distinct sandboxing primitives unique to macOS, Linux, and Windows.

Implementation Details

On macOS, the Seatbelt framework, despite being deprecated, was chosen for its robust subprocess tree containment and fine-grained permission control. Policies are dynamically generated based on user settings and workspace configurations.

Linux employs Landlock and seccomp. Seccomp blocks risky system calls, while Landlock restricts filesystem access. Ignored files are made inaccessible by mapping them into an overlay filesystem and overwriting them with Landlocked copies.

Windows utilizes WSL2 to run the Linux sandbox. A native Windows sandbox is under development, as current primitives are largely browser-centric and unsuitable for general developer tools.

Teaching Agents Sandbox Awareness

For agents to effectively use the sandbox, they must anticipate command success within its constraints and know when to request elevated permissions. This required updating the agent harness to inform agents about sandbox limitations, including filesystem, git, and network access, and how to escalate privileges.

Initial testing revealed agents often retried commands that failed due to sandbox restrictions. To mitigate this, Shell tool results now explicitly state the sandbox constraint causing failure and suggest escalating permissions. This has led to more graceful failure recovery and improved offline evaluation performance.

The sandbox has seen a third of requests on supported platforms, with significant adoption by enterprise clients. As agents increasingly interact with production systems, defining execution boundaries is critical. Future developments include agents trained specifically for sandbox environments, enabling them to write scripts and programs directly.

Teaching Agents Sandbox Awareness

Agent Sandboxing Boosts Security

Sandbox Goals

Implementation Details

Teaching Agents Sandbox Awareness

AI Daily Digest

Agent Sandboxing Boosts Security

Sandbox Goals

Implementation Details

Teaching Agents Sandbox Awareness

AI Daily Digest