Artificial Intelligence

Preferred on Google

OpenAI Build Hour Dives into GPT-Realtime-2 Capabilities

OpenAI's Build Hour showcased GPT-Realtime-2, detailing advancements in voice AI for real-time translation, speech-to-text, and conversational agents, with demos and customer spotlights.

May 14 at 12:02 AM8 min read

Three women presenting at OpenAI Build Hour with "GPT-Realtime-2" title card. — Image credit: OpenAI· OpenAI Youtube

Visual TL;DR. OpenAI Build Hour featured GPT-Realtime-2 Models. GPT-Realtime-2 Models uses Realtime API. Realtime API includes GPT-Realtime-Translate. Realtime API includes GPT-Realtime-Whisper. GPT-Realtime-2 Models enabled Real-World Applications. Real-World Applications showcased Customer Spotlight: Sierra. GPT-Realtime-2 Models addressed Addressing Failure Modes. GPT-Realtime-2 Models unlocks Future Potential.

OpenAI Build Hour: showcased advancements in voice AI capabilities
GPT-Realtime-2 Models: new models for real-time voice AI
Realtime API: enables developers to scale AI solutions
GPT-Realtime-Translate: live speech translation 70 to 13 languages
GPT-Realtime-Whisper: streaming speech-to-text transcription in real-time
Real-World Applications: demonstrations of voice AI in action
Customer Spotlight: Sierra: example of successful implementation
Addressing Failure Modes: strategies for common issues
Future Potential: scaling AI solutions across industries

Visual TL;DRQuickExplainDeeper

OpenAI's latest "Build Hour" session, "GPT-Realtime-2", showcased the advancements in their voice AI capabilities, highlighting new models and their potential applications across various industries. Hosted by Sarah Irbonos, who leads startup marketing at OpenAI, the session featured insights from Kaiya Chen and Erika Hethkott, both Solutions Engineers at OpenAI. The core focus was on the recently released GPT-Realtime-2, a model designed to empower developers and companies to scale their AI solutions.

Introducing the GPT-Realtime-2 Models

The session began with an overview of the new "Realtime API" models, including GPT-Realtime-Translate, GPT-Realtime-Whisper, and the star of the show, GPT-Realtime-2. GPT-Realtime-Translate offers live speech translation from 70 input languages into 13 output languages. GPT-Realtime-Whisper provides streaming speech-to-text transcription that occurs in real-time as someone speaks. GPT-Realtime-2 is described as a first voice model with GPT-5 class reasoning, capable of handling complex instructions, utilizing tools, recovering gracefully from errors, and maintaining longer conversational context.

The full discussion can be found on OpenAI Youtube's YouTube channel.

Build Hour: GPT-Realtime-2 - OpenAI Youtube — Build Hour: GPT-Realtime-2, from OpenAI Youtube

Demonstrating Real-World Applications

The team walked through several demonstrations to illustrate the power of these models. A live translation demo showcased the seamless conversion of spoken English to Spanish, highlighting the model's ability to maintain conversational flow. Following this, a voice-powered search agent demo illustrated how GPT-Realtime-2 can interact with a website, understand user intent, and perform actions like filtering search results based on price and ratings. The agent successfully navigated an e-commerce site, adding a tent and hiking boots to the cart based on user requests.

Key Features and Improvements

Several key features of GPT-Realtime-2 were emphasized, including preambles, a 128K context window, parallel tool calling, domain understanding, context-over-turns, and controllable tone. The expanded context window is particularly significant, allowing for longer and more nuanced conversations. The ability to control the agent's tone opens up new possibilities for creating more engaging and personalized user experiences. Benchmarks were also presented, showing GPT-Realtime-2 achieving 96.6% accuracy in Big Bench Audio: Intelligence and 48.5% pass rate in Audio MultiChallenge: Instruction Following, demonstrating substantial improvements over previous iterations.

Use Cases and Future Potential

The presentation also touched upon the broad range of potential applications for this next wave of voice AI. These include smart devices, coding assistants, mobile apps, media/video games, coaches, note-taking, finance voice-UX, and agentic video calls. The underlying technology can power voice interactions that are more natural, efficient, and context-aware, reducing the need for users to interact with interfaces through typing. This advancement signifies a significant step towards more intuitive and human-like AI interactions.

Customer Spotlight: Sierra

In a customer spotlight segment, Ken Murphy and Soham Ray from Sierra shared how their company leverages AI to build better, more human customer experiences. Sierra works with large enterprises, including Fortune 100 companies, to develop AI agents that can handle complex tasks and adhere to specific business policies. Their agents are designed to be reliable, scalable, and trustworthy, ensuring that they can represent brands accurately and effectively in customer interactions. The demonstration of their work highlighted the practical application of these advanced voice models in real-world business scenarios.

Addressing Common Failure Modes

The session also acknowledged common failure modes in voice AI, such as tool hallucination, conversational confusion, logical missteps, and spell-out failures. These failures, which can range from agents hallucinating actions to misinterpreting signals or struggling with accents, underscore the complexity of building robust voice agents. The advancements in GPT-Realtime-2 aim to mitigate these issues by improving the model's understanding, reasoning, and ability to handle noisy or complex audio inputs.

The Future of Voice AI

The Build Hour concluded with a forward-looking perspective on the future of voice AI. The progress demonstrated with GPT-Realtime-2 suggests a future where voice interactions become increasingly seamless and integrated into daily life, powering everything from smart devices to complex enterprise applications. The ongoing development in this field promises to unlock new possibilities for human-AI interaction and business efficiency.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#OpenAI #GPT-Realtime-2 #Artificial Intelligence #Voice AI #Speech Recognition #Machine Learning #AI Models #Build Hour