Model Watch · Reviewed

Anthropic Claude 3.5 Sonnet

Announced Jun 20, 2024Reviewed Jun 23, 2026

What they claimed

Anthropic positioned Claude 3.5 Sonnet as a step-change in intelligence delivered at mid-tier speed and cost, claiming it outperformed its own flagship Claude 3 Opus and competitor models on a range of evaluations while running at twice Opus's speed. It claimed new industry-leading marks on graduate-level reasoning (GPQA), undergraduate knowledge (MMLU), and coding (HumanEval), and emphasized a leap in agentic coding, citing an internal benchmark where it solved 64% of problems versus Opus's 38%. The launch also debuted Artifacts, a side-panel workspace for generated code and documents, framing the model as a collaborator rather than a chatbot. All of this came at the same $3 / $15 per million input/output token price and 200K context as the prior mid-tier model.

What shipped

It shipped immediately and broadly: free on Claude.ai and the iOS app (with higher limits for Pro and Team subscribers), and available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI at $3 per million input tokens and $15 per million output tokens with a 200K-token context window.

The verdict

This launch is widely regarded, in hindsight, as the moment Claude became the default model for serious software development. The coding and agentic-quality leap was real and held up under independent use, driving rapid adoption among developers and powering the first wave of AI coding tools and agents; an October 2024 update added computer-use capabilities and further extended that lead. The unusual move of beating the prior flagship at the mid tier's price quietly reset expectations that more capability must cost more. The main caveat for executives is that benchmark numbers were the marketing surface; the durable significance was qualitative reliability on real coding and reasoning tasks, which is harder to measure but is what actually drove the adoption.

Why it matters

It marks the point where AI moved from drafting text to credibly doing technical knowledge work, and where price and capability stopped moving in lockstep. For an executive, it is the reference point for why coding and agentic use cases became the first place AI delivered measurable productivity.

Sources

All tracked models