OpenAI unveils "ChatGPT Agent"
OpenAI has released a new capability within ChatGPT called the ChatGPT agent, marking a major step forward in AI task automation.
Created on July 18|Last edited on July 18
Comment
OpenAI has launched a new feature called ChatGPT agent, marking a major advancement in how AI interacts with the real world. Unlike previous versions that focused on generating text, the agent now uses its own virtual computer to carry out complex tasks from start to finish. This includes browsing the web, analyzing data, interacting with apps like Gmail or GitHub, and producing editable files like spreadsheets or slideshows based on user instructions.
Unified System with Real-World Actions
The ChatGPT agent combines the capabilities of two previous tools, Operator and Deep Research, into a unified system. Operator could interact with websites by clicking, scrolling, and typing, while Deep Research specialized in synthesizing and analyzing information. Now, both functions are merged, letting ChatGPT not only understand what needs to be done but also take action to accomplish it. The model shifts fluidly between reasoning and execution, allowing users to delegate tasks and receive fully completed results.
New Capabilities and Tools
The agent has access to a range of built-in tools including a visual browser, a text-based browser, a terminal, and APIs. It can also use ChatGPT connectors to interact with services like calendars and email. These tools run on a virtual computer that maintains task context across multiple steps. For example, it can gather information from a website, download a file, run code to analyze it, and summarize the results in a spreadsheet or slideshow. Users can also log in securely during tasks without exposing sensitive data to the model.
Human-Level Benchmarking and Performance
ChatGPT agent has set new benchmarks in multiple evaluations. On Humanity’s Last Exam, it achieved a pass rate of 41.6 and increased to 44.4 when allowed to select from multiple parallel outputs. On FrontierMath, a challenging math test designed for expert problem solvers, it reached 27.4 percent accuracy using tools like the terminal. Across various other benchmarks such as SpreadsheetBench and DSBench, the model outperformed previous versions and other commercial models. In some cases, its output matched or surpassed that of human experts in areas like financial modeling, competitor analysis, and complex scheduling.
Collaborative and Flexible Task Execution
The system is built for interactive use. Users can pause tasks, change instructions, or redirect focus at any point. The model will pick up with new instructions while preserving prior work. It can also ask users for clarification mid-task if more detail is needed. Mobile users receive notifications when tasks are complete. Tasks like weekly reports can also be scheduled to recur automatically. These features make the agent more than just a tool for isolated requests. It functions more like a partner that works alongside the user across time and changing needs.
Security, Safety, and Risk Mitigation
Because the agent takes real-world actions and can access personal data, OpenAI has introduced strong safeguards. ChatGPT always asks for permission before making purchases or performing tasks that have consequences. Certain high-risk actions like wire transfers are automatically rejected. To combat prompt injection attacks, where malicious content tries to manipulate the model through hidden text, OpenAI has trained the model to detect and resist such manipulation. Private inputs during browser sessions, like passwords, are never stored or seen by the model. Users also have control over data retention and can delete browsing sessions with a single click.
Rollout and Availability
ChatGPT agent is now available to Pro users and will roll out to Plus and Team users within days. Enterprise and Education users will gain access in the coming weeks. Pro users get 400 messages per month, while other paid tiers get 40. Extra usage can be added with credits. The feature is currently unavailable in the European Economic Area and Switzerland. Operator will be sunset soon, while Deep Research remains available as a selectable mode for users who want longer, slower analytical outputs.
Limitations and Future Development
The agent is powerful but still in development. Features like slideshow creation are in beta and may produce basic formatting or inconsistencies between previews and exported files. Spreadsheet editing is more stable but still improving. Users can’t yet upload a slideshow for editing, though they can upload spreadsheets. OpenAI is training the next generation of this model to deliver more refined formatting and broader capabilities. Long term, the goal is to reduce the amount of user oversight required without compromising safety, making the system faster, smarter, and more useful across both personal and professional tasks.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.