Skip to main content

Anthropic Claude 4 and Opus 4 Launches With Advanced Agent Capabilities and Best-in-Class Coding Performance

Created on May 23|Last edited on May 23
Anthropic has released the next iteration of its Claude model family: Claude Opus 4 and Claude Sonnet 4. These new models aim to push the boundaries of what AI can accomplish in software engineering, reasoning, and autonomous agent applications. Claude Opus 4 is now positioned as the best coding model on the market, excelling at long-duration and complex tasks. Claude Sonnet 4, a successor to Sonnet 3.7, delivers improvements in code generation, instruction following, and performance efficiency, offering users a well-balanced tool across use cases.

Performance and Benchmarks

Claude Opus 4 outperforms competitors and earlier Claude versions in rigorous coding benchmarks like SWE-bench (72.5%) and Terminal-bench (43.2%). It sustains coherent performance over hours-long tasks, which is critical for real-world agent use. It has been successfully tested in demanding environments by companies like Cursor, Replit, Block, Rakuten, and Cognition. Claude Sonnet 4 also delivers strong results with a SWE-bench score of 72.7%, noted for better steerability, improved reasoning, and stylistic outputs. Both models are hybrid in design, supporting instant responses and deeper extended thinking where required.



Extended Thinking and Tool Use

Both Claude 4 models support tool use during reasoning, enabling them to fetch and process external data mid-task, a feature released in beta. This allows them to perform more accurate, dynamic, and grounded reasoning. The models also run tools in parallel, increasing efficiency in complex workflows. A notable addition is enhanced memory, allowing the models to extract and retain useful facts when given access to local files. This supports long-term coherence and decision-making, particularly in agent applications. For instance, Opus 4 demonstrated the creation of a navigation guide while autonomously playing Pokémon, using persistent memory files.

Memory and Instruction Following

Anthropic claims a 65% reduction in the tendency for models to exploit shortcuts or loopholes to complete tasks. Opus 4 also leads in memory utilization. When developers provide file access, the model creates and updates memory files autonomously, maintaining context across sessions and improving consistency in tasks that span hours or days. Sonnet 4, while not matching Opus 4 in depth, significantly enhances the model’s ability to follow complex instructions and maintain style or tone, with improved fine control over output.

Claude Code and Developer Integrations

Claude Code, now generally available, expands Anthropic’s offering for developers. It integrates natively with IDEs like VS Code and JetBrains, inserting Claude-generated edits directly into files for smoother collaboration. The Claude Code SDK also allows developers to build custom agents or workflows powered by Claude’s backend. GitHub integration enables Claude to act on pull requests, address CI errors, and respond to feedback automatically. These integrations reflect a clear focus on deepening the model’s utility in professional development environments.

Broader Implications and Availability

The Claude 4 models—Opus 4 and Sonnet 4—are available via Anthropic’s API, Amazon Bedrock, and Google Cloud Vertex AI. Pricing remains consistent with prior models, with Opus 4 at $15/$75 per million tokens and Sonnet 4 at $3/$15. Opus 4 is included in Claude’s Pro, Max, Team, and Enterprise plans, while Sonnet 4 is also accessible to free users. Both models are designed to support AI safety levels up to ASL-3, signaling Anthropic’s continued focus on responsible scaling.
With Claude 4, Anthropic is moving closer to delivering a dependable virtual collaborator that understands, remembers, and reasons across extended projects. These launches represent a substantial step forward in enterprise AI readiness and agentic task execution.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.