OpenAI's Safety Playbook for Codex

OpenAI is detailing its approach to running its AI coding agent, Codex, safely within its own workflows. As AI systems increasingly act on behalf of users, performing tasks like code review and command execution, robust governance becomes critical. The company is emphasizing its strategy for controlling these agents, aiming to keep them within defined technical boundaries while enabling developer speed.

The core principle is to allow frictionless execution of low-risk actions and require explicit review for higher-risk operations. This is achieved through a multi-layered approach involving managed configuration, constrained execution, network policies, and detailed agent-native logs. The goal is to provide security teams with the necessary oversight to govern how agents operate, including access controls and approval workflows.

Controlling Codex Operations

OpenAI deploys Codex with a focus on productivity within a bounded environment. Low-risk, everyday actions are designed to be seamless, while more sensitive tasks trigger a mandatory stop for review.

Sandboxing and Approvals

Sandboxing defines the technical execution boundaries, specifying what Codex can access, write to, and whether it can connect to the network. Approval policies dictate when Codex must seek user permission, particularly for actions outside the sandbox. Users can grant one-time approvals or approve specific action types for a session.

To streamline routine tasks, OpenAI utilizes an 'Auto-review' mode. This feature allows a subagent to automatically approve certain low-risk actions, preventing constant user interruption while still flagging higher-risk or potentially unintended actions.

Configuration examples show `approvals_reviewer = "auto_review"` and defining writable roots within `sandbox_workspace_write.writable_roots = ["~/development"]`.

Network Access Limitations

Codex does not have open-ended outbound network access. A managed network policy permits connections to expected destinations, blocks unauthorized ones, and requires approval for unknown domains. This allows Codex to perform common workflows without broad network exposure.

Network configuration settings like `allowed_web_search_modes = ["cached"]` and `denied_domains = ["pastebin.com"]` illustrate these controls.

Identity and Credentials Management

Authentication for Codex is tightly managed. Credentials are secured in the OS keyring, login is enforced via ChatGPT, and access is tied to specific ChatGPT enterprise workspaces. This ensures Codex activity is linked to workspace-level controls and logged within the ChatGPT Compliance Logs Platform.

Configuration options such as `cli_auth_credentials_store = "keyring"` and `forced_login_method = "chatgpt"` are employed.

Rule-Based Command Execution

A system of rules differentiates the safety of shell commands. Common, benign commands used in daily development are permitted without approval, while potentially dangerous commands can be blocked or require explicit review. This balances speed for ordinary tasks with necessary safeguards.

Example rules include `prefix_rule(pattern = ["gh", "pr", ["view", "list"]], decision = "allow")` for GitHub inspection.

Managed Configurations

These security postures are enforced through cloud-managed requirements, macOS managed preferences, and local configuration files. These admin-enforced controls provide a consistent baseline across local Codex surfaces, including desktop apps, CLIs, and IDE extensions.

Agent-Native Telemetry and Audit Trails

Beyond control, visibility into agent behavior is crucial. While traditional logs show what happened, agent-native telemetry aims to clarify why. Codex supports OpenTelemetry for exporting events like user prompts, approval decisions, and tool execution results.

Codex activity logs are also accessible via the OpenAI Compliance Platform for enterprise customers. This detailed logging aids security teams in distinguishing between expected agent behavior, benign errors, and genuinely escalatable activity, especially when integrated with AI-powered security triage tools. The OpenAI Codex security approach provides critical context for analyzing security alerts.

These logs are also used operationally to track adoption, tool usage, and network sandbox interactions, helping to fine-tune rollout and identify areas needing adjustment. The OpenAI News report highlights the importance of these telemetry features.

As coding agents become more integrated into development, specialized tools for management are essential. Codex provides the necessary control surfaces, configuration management, sandboxing, and detailed telemetry for safe adoption, balancing developer productivity with enterprise security needs.