Google is transferring nearer to its aim of a “universal AI assistant” that may perceive context, plan and take motion.
Right this moment at Google I/O, the tech big introduced enhancements to its Gemini 2.5 Flash — it’s now higher throughout practically each dimension, together with benchmarks for reasoning, code and lengthy context — and a couple of.5 Professional, together with an experimental enhanced reasoning mode, ‘Deep Think,’ that permits Professional to contemplate a number of hypotheses earlier than responding.
“This is our ultimate goal for the Gemini app: An AI that’s personal, proactive and powerful,” Demis Hassabis, CEO of Google DeepMind, mentioned in a press pre-brief.
‘Deep Think’ scores impressively on high benchmarks
Google introduced Gemini 2.5 Professional — what it considers its most clever mannequin but, with a one-million-token context window — in March, and launched its “I/O” coding version earlier this month (with Hassabis calling it “the best coding model we’ve ever built!”).
“We’ve been really impressed by what people have created, from turning sketches into interactive apps to simulating entire cities,” mentioned Hassabis.
He famous that, primarily based on Google’s expertise with AlphaGo, AI mannequin responses enhance once they’re given extra time to assume. This led DeepMind scientists to develop Deep Assume, which makes use of Google’s newest cutting-edge analysis in considering and reasoning, together with parallel strategies.
Deep Assume has proven spectacular scores on the toughest math and coding benchmarks, together with the 2025 USA Mathematical Olympiad (USAMO). It additionally leads on LiveCodeBench, a troublesome benchmark for competition-level coding, and scores 84.0% on MMMU, which exams multimodal understanding and reasoning.
Hassabis added, “We’re taking a bit of extra time to conduct more frontier safety evaluations and get further input from safety experts.” (Which means: As for now, it’s accessible to trusted testers through the API for suggestions earlier than the potential is made extensively accessible.)
General, the brand new 2.5 Professional leads in style coding leaderboard WebDev Enviornment, with an ELO rating — which measures the relative talent stage of gamers in two-player video games like chess — of 1420 (intermediate to proficient). It additionally leads throughout all classes of the LMArena leaderboard, which evaluates AI primarily based on human desire.
Since its launch, “we’ve been really impressed by what [users have] created, from turning sketches into interactive apps to simulating entire cities,” mentioned Hassabis.
Vital updates to Gemini 2.5 Professional, Flash
Additionally right now, Google introduced an enhanced 2.5 Flash, thought of its workhorse mannequin designed for velocity, effectivity and low value. 2.5 Flash has been improved throughout the board in benchmarks for reasoning, multimodality, code and lengthy context — Hassabis famous that it’s “second only” to 2.5 Professional on the LMArena leaderboard. The mannequin can also be extra environment friendly, utilizing 20 to 30% fewer tokens.
Google is making closing changes to 2.5 Flash primarily based on developer suggestions; it’s now accessible for preview in Google AI Studio, Vertex AI and within the Gemini app. It will likely be typically accessible for manufacturing in early June.
Google is bringing extra capabilities to each Gemini 2.5 Professional and a couple of.5 Flash, together with native audio output to create extra pure conversational experiences, text-to-speech to assist a number of audio system, thought summaries and considering budgets.
With native audio enter (in preview), customers can steer Gemini’s tone, accent and magnificence of talking (assume: directing the mannequin to be melodramatic or maudlin when telling a narrative). Like Undertaking Mariner, the mannequin can also be outfitted with instrument use, permitting it to go looking on customers’ behalf.
Different experimental early voice options embody affective dialogue, which provides the mannequin the power to detect emotion in person voice and reply appropriately; proactive audio that permits it to tune out background conversations; and considering within the Reside API to assist extra advanced duties.
New multiple-speaker options in each Professional and Flash assist greater than 24 languages, and the fashions can rapidly swap from one dialect to a different. “Text-to-speech is expressive and can capture subtle nuances, such as whispers,” Koray Kavukcuoglu, CTO of Google DeepMind, and Tulsee Doshi, senior director for product administration at Google DeepMind, wrote in a weblog posted right now.
Additional, 2.5 Professional and Flash now embody thought summaries within the Gemini API and Vertex AI. These “take the model’s raw thoughts and organize them into a clear format with headers, key details, and information about model actions, like when they use tools,” Kavukcuoglu and Doshi clarify. The aim is to offer a extra structured, streamlined format for the mannequin’s considering course of and provides customers interactions with Gemini which can be less complicated to grasp and debug.
Like 2.5 Flash, Professional can also be now outfitted with ‘thinking budgets,’ which provides builders the power to manage the variety of tokens a mannequin makes use of to assume earlier than it responds, or, if they like, flip its considering capabilities off altogether. This functionality will probably be typically accessible in coming weeks.
Lastly, Google has added native SDK assist for Mannequin Context Protocol (MCP) definitions within the Gemini API in order that fashions can extra simply combine with open-source instruments.
As Hassabis put it: “We’re living through a remarkable moment in history where AI is making possible an amazing new future. It’s been relentless progress.”
Day by day insights on enterprise use instances with VB Day by day
If you wish to impress your boss, VB Day by day has you coated. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for optimum ROI.
An error occured.

