
Now loading...
Google has rolled out new audio features for its Gemini AI model, making it easier for developers and businesses to integrate lifelike voice interactions into their applications. Through the Gemini Live API now available on Vertex AI, companies are using these native audio tools to streamline operations like processing mortgages and handling customer service calls. Early adopters report impressive results, with the technology enabling more natural conversations that users often mistake for human ones.
David Wurtz, Shopify’s vice president of product, highlighted how the new capabilities in Gemini 2.5 Flash Native Audio have transformed their Sidekick tool. Users quickly forget they’re engaging with an AI, sometimes even expressing gratitude after extended discussions, which helps merchants succeed more effectively.
At United Wholesale Mortgage, chief technology officer Jason Bressler noted that incorporating the Gemini 2.5 Flash Native Audio model has boosted their AI assistant Mia since its May 2025 launch. This integration has already facilitated over 14,000 loans for broker partners, showcasing the practical impact on financial services.
Newo.ai’s co-founder David Yang praised the model’s performance in Vertex AI for powering their AI receptionists. These systems deliver exceptional conversational smarts, recognizing primary speakers amid noise, seamlessly switching languages during talks, and conveying natural emotions through speech.
Building on these foundations, Gemini now includes advanced live speech-to-speech translation for seamless real-time communication. This supports both ongoing listening modes and bidirectional dialogues, breaking down language barriers in everyday scenarios.
In continuous listening, the AI translates surrounding speech from various languages into one preferred output tongue. Imagine wearing headphones and experiencing your environment in your native language without interruption.
For two-way talks, it manages translations between two languages on the fly, adjusting the output based on the speaker. Picture conversing with someone who speaks Hindi while you use English: your headphones provide instant English versions of their words, and your device responds in Hindi when it’s your turn.
These translation tools stand out with broad language support, covering more than 70 languages and around 2,000 combinations by drawing on Gemini’s extensive knowledge and audio processing strengths. They also maintain the original speaker’s tone, speed, and pitch to keep translations feeling authentic and human-like.
The system handles mixed-language inputs in a single interaction, eliminating the need to adjust settings constantly during group discussions. Automatic language detection kicks in right away, so you don’t have to identify the tongue being spoken to begin translating.
Even in challenging conditions, like bustling streets, it suppresses background noise to ensure clear, reliable exchanges.
