Google's Gemini 3.1 Flash TTS adds expressive AI vo…

Google is rolling out Gemini 3.1 Flash TTS, its latest text-to-speech AI model, promising more natural and expressive synthesized voices. The update brings granular control over vocal performance, aiming to empower developers and enterprises building next-generation audio applications.

First detailed by Deepmind, the model achieves an impressive Elo score of 1,211 on the Artificial Analysis TTS leaderboard, indicating a strong human preference for its output quality.

Enhanced Control with Audio Tags

A key innovation is the introduction of audio tags. These allow users to embed natural language commands directly into text inputs to precisely direct vocal style, pacing, and delivery. This feature places developers in the "director's chair," enabling detailed scene direction and speaker-specific instructions.

Users can configure audio profiles for distinct characters and apply "Director's Notes" for pace, tone, and accent adjustments. Inline tags offer further mid-sentence expression changes.

The precise parameters can be exported as Gemini API code for consistent voice application across projects.

Developers can begin experimenting with these advanced controls in Google AI Studio.

Global Scale and Security

Gemini 3.1 Flash TTS supports over 70 languages, facilitating localized and expressive speech experiences worldwide. To combat misinformation, all audio generated by the model is watermarked using SynthID, an imperceptible digital signature that reliably identifies AI-generated content.

The model is available in preview via the Gemini API and Google AI Studio for developers, and on Vertex AI for enterprises. Google Vids will also integrate the technology for Workspace users.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

Google's Gemini 3.1 Flash TTS adds expressive AI voice

Enhanced Control with Audio Tags

Global Scale and Security