OpenAI has announced the release of three new audio models accessible via its API, promising significant advancements in how AI interacts with sound and language. The company showcased these models with demonstrations of real-time translation and intelligent voice agents capable of understanding and acting on instructions.
Real-Time Translation Capabilities
One of the key features highlighted is the real-time translation capability. The presenter demonstrated how the model can listen to speech in one language, such as French, and translate it into another language, like English, simultaneously. This process appears seamless, with the translation output mirroring the spoken input with minimal delay. The model waits for a key word or phrase before initiating the translation, allowing for more natural conversational flow. This capability extends across a remarkable 70 different languages, aiming to bridge communication gaps on a global scale.
