DeepSeek V3
DeepSeek positioned V3 as an open-weight Mixture-of-Experts model — 671 billion total parameters with only 37 billion active per token — trained on 14.8 trillion tokens, that matched leading closed models like GPT-4o and Claude 3.5 Sonnet on many benchmarks. The headline was economics, not just quality: the technical report described pre-training in roughly 2.788 million H800 GPU-hours for about $5.5 million, a fraction of the $100M-plus widely associated with frontier training runs.
V3 shipped December 26, 2024 with downloadable Base and Chat weights (MIT-licensed code plus a separate model license permitting commercial use) and an OpenAI-compatible API; a 45-day introductory price ended February 8, 2025, after which rates rose to about $0.27 per million input tokens (cache miss) and $1.09 per million output tokens.
V3 was the quieter base model that set up the moment everyone remembers — DeepSeek R1 a month later. The $5.5M figure deserves a caveat executives should internalize: it covers the final pre-training compute only, not research, salaries, prior runs, or hardware capital, so it understates true cost and should not be read as 'frontier AI now costs $5.5M.' What was real and durable: a genuinely capable open-weight model, released under a permissive license at very low API prices, from a Chinese lab — which compressed the assumed gap between open and closed frontier models and pressured incumbents on price. The introductory pricing ended within weeks under demand, a reminder that launch-day economics are a promotion, not a permanent rate. The signal is the trend it confirmed: capable models are getting cheaper and more open, fast.
DeepSeek V3 showed that open-weight models can approach the closed frontier at a small fraction of the headline cost, reshaping how executives should think about AI pricing, vendor lock-in, and where capability actually comes from.