OpenAI is rolling out a new generation of real-time voice models designed to imbue voice applications with greater intelligence and responsiveness. This advancing voice intelligence initiative introduces three distinct models to its API, aiming to bridge the gap between human conversation and machine action.
The core of the update lies in three new models: GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper. These are intended to move voice interfaces beyond simple command-and-response to systems that can actively listen, reason, translate, and act as a conversation unfolds.
The New Voice AI Arsenal
GPT‑Realtime‑2 is positioned as OpenAI's first voice model with GPT‑5-class reasoning capabilities. It's built to handle complex requests, maintain conversational flow, and integrate with tools seamlessly. Developers can enable features like short preambles to signal processing, parallel tool calls for efficiency, and improved recovery mechanisms for errors.