Google Gemini 2.5 Pro
Google introduced Gemini 2.5 Pro as a "thinking model" designed to reason through a problem before responding, claiming it topped the LMArena human-preference leaderboard "by a significant margin" at launch. It claimed leadership on common coding, math, and science benchmarks, including a state-of-the-art result on Humanity's Last Exam without tool use, and shipped with a 1-million-token context window (with 2 million promised). It was positioned as a strongly multimodal, long-context model aimed at complex reasoning and coding workloads.
It launched on March 25, 2025 as "Gemini 2.5 Pro Experimental," moved to public preview in April, and reached stable general availability on June 17, 2025 across Google AI Studio, the Gemini app, and Vertex AI. API pricing for the GA model was tiered by context length, starting around $1.25 per million input tokens and $10 per million output tokens for prompts up to 200K tokens, with higher rates above that.
The capability claims largely held up: Gemini 2.5 Pro genuinely led LMArena at launch and was widely regarded as a top-tier reasoning and long-context model through most of 2025, and its 1M-token context was a real, usable differentiator for document-heavy work. The weaker part of the story was packaging: the drawn-out experimental to preview to GA progression produced a confusing trail of dated "snapshot" versions that made it hard for buyers to know which model they were actually running — a useful cautionary tale about versioning discipline. On coding it was competitive but not dominant, trailing some rivals on agentic software-engineering benchmarks. The most important caveat is recency: Google superseded this model within the same calendar year — Gemini 3 Pro launched in November 2025 and took the outright frontier lead — so "the 2025 Gemini flagship" is a moving label, and any standardization decision should assume rapid iteration.
Gemini 2.5 Pro was the model that put Google back in genuine contention at the frontier and made million-token context a practical enterprise capability rather than a spec-sheet number. For executives it also illustrates how quickly "the best model" turns over — and how confusing vendor versioning can quietly undermine reproducibility in real deployments.