Google released Gemini 3.1 Flash TTS in developer preview, a text-to-speech model that lets developers steer vocal pace, style, and delivery via natural-language audio tags embedded directly in the input text — without a separate API call per speaker. The model supports native multi-speaker dialogue in 70+ languages and scored an Elo of 1,211 on the Artificial Analysis TTS leaderboard. All generated audio is watermarked with Google's SynthID to enable identification of AI-generated content. Access is available through Google AI Studio, the Gemini API, and Vertex AI; Google Workspace users can use it through Google Vids.

Google Releases Gemini 3.1 Flash TTS with Prompt-Controlled Voice Style Across 70+ Languages

Citations