OpenAI Codex: The Future of Agent Engineering

OpenAI's Codex is evolving into a powerful agent engineering platform with new features like Skills, Apps, and a scoring system for AI agents.

Mar 10 at 6:02 PM4 min read
Three OpenAI team members discussing Codex and agent engineering.

In a recent OpenAI Build Hours session, the team showcased the latest advancements in their Codex platform, highlighting its evolution into a powerful tool for agent engineering. The discussion, featuring Ryan Lopopolo from the Technical Staff, focused on how Codex is moving beyond basic code generation to enable more sophisticated AI-driven workflows.

Key Participants and Their Roles

The session featured three key individuals from OpenAI: Ryan Lopopolo, a Member of the Technical Staff, who provided a deep dive into the technical aspects and demos; Charlie Guo, from the Developer Experience team, who introduced the session and its goals; and Cristina Jones, who handled Startup Marketing and provided context from that perspective.

The full discussion can be found on OpenAI Youtube's YouTube channel.

Build Hour: API & Codex — from OpenAI Youtube

The Evolution of Codex: From Code Generation to Agent Engineering

Ryan Lopopolo framed the discussion by outlining the progression of AI's role in software development, moving through distinct phases. Initially, AI models like Codex focused on auto-completion and generating code snippets. This evolved into 'pair programming' where AI acts as a collaborative partner. The current phase, however, is characterized by 'agentic delegation,' where AI agents can autonomously handle complex tasks and workflows.

Lopopolo emphasized that the latest advancements in GPT-4.5 and GPT-5.3 models have significantly boosted Codex's capabilities. He highlighted that Codex is now 30% faster with improved performance on large code changes and delegation, and crucially, it supports up to 1 million tokens of context. This expanded context window is vital for agents to understand and operate within complex codebases and project requirements.

New Features: Skills, Apps, and Windows Availability

A major focus of the session was the introduction of new features within the Codex app, specifically 'Skills' and 'Apps.' 'Skills' allow developers to give Codex new capabilities and expertise, enabling it to interact with external tools and services. These skills can be developed and shared, creating a richer ecosystem of AI-powered functionalities.

The 'Apps' feature, previously known as 'Connectors,' allows for the integration of Codex with everyday tools like ChatGPT, GitHub, and Google Calendar. This integration aims to streamline workflows by bringing AI capabilities directly into the tools developers already use. The team also announced the exciting news that the Codex app is now officially available on Windows, a significant milestone for broader accessibility.

Agent Legibility Scorecard: Measuring AI Performance

A key innovation discussed was the 'Agent Legibility Scorecard,' a tool designed to evaluate the quality and reliability of AI agents. This scorecard measures agents against seven distinct metrics, including Bootstrap Self-Sufficiency, Task Entrypoints, Validation Harness, Lint + Format Gates, Agent Repo Map, Structured Docs, and Decision Records. The system provides a grade and detailed feedback, allowing developers to understand an agent's strengths and weaknesses.

Lopopolo demonstrated how this scorecard works by analyzing the 'symphony' repository, a project that was recently open-sourced. The agent was able to identify various aspects of the project's structure, detect potential issues like missing linters, and suggest improvements, showcasing the practical application of these legibility metrics.

Harnessing Engineering: The Future of AI Collaboration

The concept of 'Harness Engineering' was presented as a crucial aspect of this evolution. It emphasizes the importance of providing AI agents with the right context and tools to perform tasks effectively. By leveraging skills and integrations, developers can guide AI agents to work on specific parts of the codebase or even entire workflows, freeing up human engineers to focus on higher-level strategic tasks.

Lopopolo shared a personal anecdote about how he used Codex to help his team engineer the symphony project, highlighting the tool's ability to identify and suggest improvements to the codebase, ultimately leading to better quality and more reliable AI-driven development.

Key Takeaways and Future Outlook

The session underscored OpenAI's commitment to democratizing AI and empowering developers. The advancements in Codex, from improved model capabilities to the introduction of skills, apps, and the legibility scorecard, signal a significant shift towards AI as a collaborative partner in the software development lifecycle. The ability to integrate AI seamlessly into existing workflows and gain insights into agent performance is key to unlocking the next level of productivity and innovation in the field.