For conversational AI to feel natural, it must operate at the speed of human speech. Awkward pauses, clipped interruptions, and delayed barge-in signals that the network is getting in the way. OpenAI's work on OpenAI low-latency voice AI aims to eliminate these friction points for ChatGPT voice, developers using its Realtime API, and interactive workflows. Achieving this at OpenAI’s scale, serving over 900 million weekly active users, demands global reach, rapid connection setup, and consistently low media latency. According to OpenAI News, the company re-engineered its WebRTC infrastructure to overcome limitations with existing models at scale.
The core challenge involved integrating WebRTC, a standard for real-time communication, with OpenAI's massive Kubernetes-based infrastructure. Traditional WebRTC often relies on a one-port-per-session model, which clashes with the dynamic, port-constrained nature of modern cloud deployments. This approach struggles with port exhaustion and requires stable ownership of stateful sessions like ICE and DTLS, issues that become critical when managing millions of concurrent connections.