Google DeepMind Unveils Gemini 2.5 Computer Use Model
Google DeepMind has introduced the Gemini 2.5 Computer Use model, a specialized extension of Gemini 2.5 Pro designed for agents that can directly interact with user interfaces
Created on October 8|Last edited on October 8
Comment
Google DeepMind has introduced the Gemini 2.5 Computer Use model, a specialized extension of Gemini 2.5 Pro designed for agents that can directly interact with user interfaces. The model is available for developers through the Gemini API in Google AI Studio and Vertex AI. It brings a new level of automation to digital workflows by enabling AI systems to control browsers and applications the same way a human would.
Purpose of the Gemini 2.5 Computer Use Model
Traditional AI systems typically rely on structured APIs to interact with software, but many real-world tasks still depend on visual and interactive user interfaces. The Gemini 2.5 Computer Use model addresses this gap by giving AI agents the ability to perform on-screen actions such as clicking buttons, typing into forms, navigating pages, and selecting dropdown options. This makes it possible for developers to build general-purpose digital agents capable of completing complex workflows across web and mobile environments.
How the Model Works
The model operates through the new “computer_use” tool in the Gemini API. It functions as a continuous loop where the system receives inputs such as the user’s request, a screenshot of the current interface, and a record of past actions. The model then analyzes this information and outputs a corresponding UI action, like click, scroll, or type. Some actions require explicit user confirmation, especially for sensitive operations like purchases or account changes.
After each action is executed, the environment sends back a new screenshot and context to the model, which evaluates the updated state and decides the next step. This iterative process continues until the requested task is complete or the loop is stopped. While optimized for web browsers, the model already performs well on mobile tasks and is expected to expand to broader interface types over time.
Performance and Benchmarks
The Gemini 2.5 Computer Use model has demonstrated strong results across multiple independent benchmarks. It leads on Online-Mind2Web, WebVoyager, and AndroidWorld, outperforming other models in both accuracy and latency. According to performance data, it achieves over 70 percent accuracy with significantly reduced latency, showing that it can execute browser tasks faster and more efficiently than other existing computer control systems.
Safety and Responsible Design
Google DeepMind built safety directly into the model to handle risks unique to agentic systems. Because these agents can interact with live digital environments, safeguards are essential to prevent misuse, prompt injections, or harmful operations. The model integrates an inference-time safety layer that checks every proposed action before it runs, and developers can define custom system instructions to require confirmation for sensitive actions.
These built-in controls help ensure responsible deployment, while Google encourages developers to further test and harden their implementations before integrating the model into production systems.
Real-World Applications and Early Use Cases
Internal Google teams and early partners are already using Gemini 2.5 Computer Use for a variety of automation and testing tasks. Within Google, it has been integrated into UI testing systems to reduce software development time and recover broken workflows automatically. External partners such as Poke.com and Autotab report that the model performs significantly faster and more reliably than previous alternatives. One major implementation improved complex data parsing accuracy by nearly 20 percent, while others saw faster task completion and fewer failures during web interactions.
Getting Started with Gemini 2.5 Computer Use
The model is available through the Gemini API for developers using Google AI Studio or Vertex AI. A demo environment hosted by Browserbase allows users to see the system in action. Developers can also build their own local or cloud-hosted agent loops using Playwright. Google invites developers to share feedback and contribute to future improvements through its developer forum.
Gemini 2.5 Computer Use represents a step toward a new generation of general-purpose digital agents capable of navigating and completing tasks within the same visual interfaces that humans use daily.
Add a comment