Model Watch · Reviewed

xAI Grok 3

Announced Feb 17, 2025Released Feb 17, 2025Reviewed Jun 23, 2026

What they claimed

Musk called Grok 3 the 'smartest AI on Earth,' positioning it as xAI's first true frontier entry. xAI said it was trained on the Colossus supercomputer in Memphis — first ~100,000 and later ~200,000 Nvidia H100 GPUs — representing roughly 10x the compute of Grok 2. The launch graphs claimed leading scores on math (AIME), science (GPQA), and coding benchmarks, and an early checkpoint ('Chocolate') topped the Chatbot Arena leaderboard. xAI also debuted DeepSearch, an agentic web-research feature, and a Think reasoning mode.

What shipped

Grok 3 and Grok 3 mini (plus Reasoning variants) rolled out to X Premium+ subscribers, with that tier's price raised to about $40/month, and a new standalone SuperGrok plan (~$30/month) for higher limits and DeepSearch. An API followed in the weeks after, and xAI briefly offered Grok 3 free to all users to drive trial.

The verdict

Grok 3 was a genuine arrival: it credibly placed xAI among the frontier labs, validated the speed of the Colossus build-out, and was competitive on public benchmarks within days of OpenAI's o-series and Google's Gemini. But the launch was immediately dented by a benchmark-presentation dispute — OpenAI staff noted xAI's AIME chart showed Grok 3's best-of-64 consensus score against rivals' single-attempt scores, an apples-to-oranges comparison that flattered Grok. The model was strong, but the framing overstated the margin, and independent rankings settled it as roughly at parity with, not clearly ahead of, the leaders. For executives, the takeaway is that the capability was real while the 'smartest on Earth' claim was marketing.

Why it matters

Grok 3 is the moment a fourth serious frontier lab emerged, showing that raw capital and GPU scale can compress the gap to OpenAI and Google in under a year — and a textbook case of why launch-day benchmark charts deserve scrutiny.

Sources

All tracked models