OpenAI's latest "Build Hour" session, "GPT-Realtime-2", showcased the advancements in their voice AI capabilities, highlighting new models and their potential applications across various industries. Hosted by Sarah Irbonos, who leads startup marketing at OpenAI, the session featured insights from Kaiya Chen and Erika Hethkott, both Solutions Engineers at OpenAI. The core focus was on the recently released GPT-Realtime-2, a model designed to empower developers and companies to scale their AI solutions.
Introducing the GPT-Realtime-2 Models
The session began with an overview of the new "Realtime API" models, including GPT-Realtime-Translate, GPT-Realtime-Whisper, and the star of the show, GPT-Realtime-2. GPT-Realtime-Translate offers live speech translation from 70 input languages into 13 output languages. GPT-Realtime-Whisper provides streaming speech-to-text transcription that occurs in real-time as someone speaks. GPT-Realtime-2 is described as a first voice model with GPT-5 class reasoning, capable of handling complex instructions, utilizing tools, recovering gracefully from errors, and maintaining longer conversational context.
