This vision was realized with the launch of Gemini 1.0 last December. As the first model designed to be inherently multimodal, Gemini 1.0 and 1.5 have made significant strides in multimodality and long context, enabling a comprehensive understanding of information across text, video, images, audio, and code, while processing a greater volume of data.
Currently, millions of developers are leveraging Gemini, which is transforming all of Google's products—seven of which serve 2 billion users—and facilitating the creation of new offerings. NotebookLM exemplifies the potential of multimodality and long context, showcasing why it has garnered widespread appreciation.
Over the past year, Google has focused on developing more agentic models that can better comprehend the surrounding world, anticipate multiple steps ahead, and take actions on your behalf, all under your guidance.
Today, Google is thrilled to announce the next generation of models tailored for this new agentic phase: introducing Gemini 2.0, the most advanced model to date. With new enhancements in multimodality—such as native image and audio output—and integrated tool usage, this model will empower the company to create innovative AI agents that align more closely with its vision of a universal assistant.
Google is making Gemini 2.0 available to developers and trusted testers today, and is rapidly integrating it into its products, starting with Gemini and Search. Beginning today, the Gemini 2.0 Flash experimental model will be accessible to all Gemini users. Additionally, a new feature called Deep Research is being launched, which utilizes advanced reasoning and long context capabilities to function as a research assistant, delving into complex topics and compiling reports on your behalf. This feature is available in Gemini Advanced today.
AI has significantly transformed the landscape of Search, with no product experiencing such a profound change. Google AI Overviews now engage 1 billion users, allowing them to pose entirely new types of inquiries, rapidly becoming one of the most favored features in Search history.
No product has been transformed more by AI than Search. Our AI Overviews now reach 1 billion people, enabling them to ask entirely new types of questions — quickly becoming one of our most popular Search features ever. As a next step, we’re bringing the advanced reasoning capabilities of Gemini 2.0 to AI Overviews to tackle more complex topics and multi-step questions, including advanced math equations, multimodal queries and coding. We started limited testing this week and will be rolling it out more broadly early next year. And we’ll continue to bring AI Overviews to more countries and languages over the next year, as stated by Sundar Pichai, CEO of Google and Alphabet.
The advancements in Gemini 2.0 are supported by a decade of investment in a unique full-stack approach to AI innovation. It leverages custom hardware such as Trillium, the sixth generation of TPUs. These TPUs were responsible for 100% of the training and inference for Gemini 2.0, and Trillium is now available for customers to utilize in their own projects.
Gemini 1.0 focused on information organization and comprehension; Gemini 2.0 will significantly enhance its utility. Pichai anticipates exciting developments in this next phase.
Enhancing Agentic Experiences with Gemini 2.0
The native user interface capabilities of Gemini 2.0 Flash, combined with advancements such as multimodal reasoning, extended context comprehension, intricate instruction adherence and planning, compositional function-calling, native tool utilization, and reduced latency, collectively facilitate a new generation of agentic experiences.
The practical deployment of AI agents represents a research domain brimming with promising opportunities. We are delving into this innovative landscape through a series of prototypes designed to assist individuals in completing tasks efficiently. These initiatives include an update to Project Astra, its research prototype investigating the future potential of a universal AI assistant; the newly launched Project Mariner, which examines the evolution of human-agent interaction, beginning with your web browser; and Jules, an AI-driven coding assistant aimed at supporting developers.
"We’re still in the early stages of development, but we’re excited to see how trusted testers use these new capabilities and what lessons we can learn, so we can make them more widely available in products in the future."
Project Astra: Agents Using Multimodal Understanding in Real World
Since the launch of Project Astra at I/O, the technology company has been gathering insights from a select group of testers utilizing it on Android devices. Their constructive feedback has been instrumental in gaining a deeper understanding of how a universal AI assistant can function effectively, particularly regarding safety and ethical considerations. The latest iteration, enhanced with Gemini 2.0, features several key improvements:
Enhanced dialogue: Project Astra now supports conversations in multiple languages, including mixed-language interactions, and demonstrates a better grasp of various accents and less common vocabulary.
Expanded tool integration: With the capabilities of Gemini 2.0, Project Astra can now leverage Google Search, Lens, and Maps, significantly enhancing its utility as a daily assistant.
Refined memory: We have advanced Project Astra’s memory capabilities, allowing it to retain information while ensuring user control. It now offers up to 10 minutes of in-session memory and can recall more past interactions, providing a more personalized experience.
Reduced latency: Thanks to new streaming features and improved audio comprehension, the assistant can process language with a latency comparable to that of human conversation.
"We’re working to bring these types of capabilities to Google products like Gemini app, our AI assistant, and to other form factors like glasses. And we’re starting to expand our trusted tester program to more people, including a small group that will soon begin testing Project Astra on prototype glasses."