Mistral Mixtral 8x7B
Mistral, the Paris-based startup positioned as Europe's frontier AI contender, introduced Mixtral 8x7B as a high-quality sparse Mixture-of-Experts (SMoE) model with open weights. The design holds 46.7B total parameters but activates only about 12.9B per token by routing each token to 2 of 8 expert networks, giving the quality of a much larger model at the speed and cost of a far smaller one. Mistral claimed Mixtral outperformed Llama 2 70B on most benchmarks with roughly 6x faster inference, and matched or beat GPT-3.5 on most standard tests. It supported a 32k-token context and five languages.
Mixtral 8x7B shipped with open weights under the permissive Apache 2.0 license, available for download and self-hosting, and Mistral contributed changes to open-source serving stacks (e.g., vLLM) so others could deploy it freely.
In hindsight Mixtral was a quietly pivotal release: it was the moment the sparse Mixture-of-Experts approach moved from research curiosity to a practical, downloadable open model, validating the architecture that later powered far larger systems. Its Apache 2.0 license made it genuinely usable in commercial products without the usage caveats attached to some 'open' rivals, which mattered for enterprise adoption. The performance claims against Llama 2 70B and GPT-3.5 held up reasonably well for its size and were not dogged by the kind of benchmark-presentation dispute that later hit competitors. It established Mistral as the credible European alternative to U.S. labs and a reference point for efficient open models. The chief caveat is simply age: by 2025 it had been surpassed, but its influence on how the field thinks about compute efficiency endures.
Mixtral proved that open, permissively licensed models could be both efficient and competitive, giving enterprises a non-U.S. supplier and making 'mixture of experts' a mainstream design choice rather than a lab experiment.