Google is pushing its Google Gemini audio model further with the introduction of Gemini 3.1 Flash Live. This latest iteration aims to significantly enhance the naturalness and reliability of AI-powered voice interactions, a crucial step for the future of voice-first technology. The new model is detailed on Deepmind.
Gemini 3.1 Flash Live reportedly boasts improved precision and reduced latency, paving the way for more fluid and intuitive conversations with AI. Developers can access this advanced audio AI through the Gemini Live API in Google AI Studio, enabling them to build more sophisticated voice agents capable of handling complex tasks at scale.
Enhanced Performance and Understanding
The model shows marked improvements in benchmarks designed to test multi-step function calling and complex instruction following. On the ComplexFuncBench Audio benchmark, it achieved a score of 90.8%, a significant leap from its predecessor. Furthermore, Gemini 3.1 Flash Live demonstrates a heightened ability to understand tone and nuances in speech, allowing for more dynamic responses that can adapt to user emotions like frustration or confusion.
This enhanced tonal understanding is particularly beneficial for enterprise applications. In Gemini Enterprise for Customer Experience, the model can better recognize acoustic details such as pitch and pace, leading to more empathetic and effective customer interactions.
Broader Availability and Features
For consumers, Gemini 3.1 Flash Live powers the latest updates to Gemini Live and Search Live, delivering faster responses and extended conversational memory. This means users can engage in longer, more coherent brainstorming sessions or get real-time assistance without losing the conversational thread.
The multilingual capabilities of 3.1 Flash Live are also enabling a global rollout of Search Live, now accessible in over 200 countries and territories. This expansion allows users worldwide to interact with Search in their preferred language through real-time, multimodal conversations.
To combat misinformation, all audio generated by Gemini 3.1 Flash Live is watermarked using SynthID, an imperceptible marker embedded directly into the audio output for reliable detection of AI-generated content.
