Skip to main content

Google launches a Browser Agent and Gemini 2.0

Created on December 11|Last edited on December 11
Google DeepMind has introduced Gemini 2.0, their latest AI model, designed to handle complex multimodal tasks and enable new applications of artificial intelligence. As part of this release, Project Mariner serves as a demonstration of Gemini 2.0’s capabilities, showcasing how AI can assist users with multi-step workflows directly in a browser environment.

Understanding Gemini 2.0

Gemini 2.0 builds on previous models with advancements in multimodal functionality, allowing it to process and generate text, images, video, and audio. It also supports long-context understanding and improved reasoning, enabling the model to assist in more detailed and interactive tasks. These developments aim to make AI more functional for practical applications while maintaining human oversight.

Project Mariner

Project Mariner, powered by Gemini 2.0, is a research prototype developed as an experimental Chrome extension. It is designed to help users complete complex tasks that require navigating and gathering information from the web. In one example, the agent was tasked with finding contact information for a list of companies. It read the data from a Google Sheet, searched for company websites, navigated pages, and recorded email addresses.

The prototype operates only within the user’s active browser tab, ensuring transparency and user control. Real-time reasoning is displayed through the interface, allowing users to understand and guide the agent’s actions. Users can pause or stop the process at any time, maintaining oversight and reducing risks.

Availability

Google is taking a careful approach with Project Mariner, involving trusted testers to evaluate its functionality and limitations. Feedback from these tests will shape further improvements, focusing on enhancing reliability and usability while addressing potential challenges like speed and accuracy.

Deep Research: Introducing Enhanced Research Capabilities in Gemini

As part of the Gemini 2.0 rollout, Google DeepMind has introduced Deep Research, a new feature available to Gemini Advanced subscribers. Designed to tackle time-intensive research tasks, Deep Research utilizes the advanced capabilities of Gemini to explore complex topics and deliver well-organized, actionable reports. This feature represents the next step in making AI more agentic by enabling it to independently gather, analyze, and summarize information, all under user supervision.
Deep Research operates by creating a multi-step research plan based on the user’s question. Once the plan is approved, the AI emulates a detailed online search process, iteratively refining its understanding as it navigates through web sources. The result is a comprehensive report containing key findings, complete with source links for further exploration. Whether you're a student delving into autonomous vehicle technology, an entrepreneur conducting a competitive analysis, or a marketer benchmarking campaigns, Deep Research simplifies the process of sifting through large amounts of information.

Gemini 2.0 as a Foundation for Broader Applications

While Project Mariner is a focused example, Gemini 2.0 provides a versatile framework for applications across different fields. Its ability to handle multimodal tasks and its emphasis on contextual understanding create opportunities for AI to assist in various domains without overstating its capabilities. This approach balances innovation with practical use and accountability.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.