Model Watch · Reviewed

Anthropic Claude Opus 4

Announced May 22, 2025Released May 22, 2025Reviewed Jun 23, 2026

What they claimed

Anthropic billed Claude Opus 4 as "our most powerful model yet" and "the world's best coding model," leading SWE-bench Verified at 72.5% and emphasizing sustained, long-horizon agentic work rather than single-shot answers. Its headline proof point was a customer, Rakuten, running the model independently on a demanding open-source refactor for seven hours with sustained performance. Both Opus 4 and the cheaper Claude Sonnet 4 were pitched as hybrid models offering near-instant responses or deeper "extended thinking," with parallel tool use and improved memory when given file access. Anthropic also disclosed it was deploying Opus 4 under its stricter AI Safety Level 3 (ASL-3) protections.

What shipped

Both models were generally available at announcement across the Anthropic API, the Claude apps, Amazon Bedrock, Google Vertex AI, and Claude Code. Opus 4 launched at $15 per million input tokens and $75 per million output tokens (paid tiers only), while Sonnet 4 launched at $3/$15 and was also available to free users.

The verdict

The "best coding model" claim held up on the benchmark Anthropic chose to lead with, but it was a coding-specific crown, not across-the-board supremacy: contemporaneous coverage noted Opus 4 trailed OpenAI's o3 on some reasoning and knowledge tests. A telling nuance for buyers is that the cheaper Sonnet 4 actually edged out flagship Opus 4 on SWE-bench, so the Opus price premium bought long-horizon endurance and agentic reliability rather than a higher headline coding score. The agentic-coding reception proved strong and durable — Opus 4 anchored the rise of Claude Code and the broader 4.x line that competitors are still measured against. On price, the $15/$75 launch tag was steep relative to peers, and Anthropic later cut Opus pricing substantially, signaling the launch premium was not permanent. The ASL-3 framing was a genuine governance step, best read as precautionary risk-tiering rather than evidence of dangerous capability.

Why it matters

Claude Opus 4 is the model that made "an agent that codes for hours unattended" a credible enterprise pitch rather than a demo, pushing AI from chat assistant toward autonomous coworker. The franchise it launched now sets the agentic-coding benchmark that rival vendors are judged against.

Sources

All tracked models