Model Watch · Reviewed

Google Gemini 1.5 Pro

Announced Feb 15, 2024Reviewed Jun 23, 2026

What they claimed

Google announced Gemini 1.5 Pro as a Mixture-of-Experts model that matched the performance of its prior top-end Gemini 1.0 Ultra while using far less compute. The headline claim was long context: a standard 128,000-token window with a limited preview running up to 1 million tokens, the longest of any large-scale foundation model at the time. Google said this let the model reason over roughly an hour of video, 11 hours of audio, codebases over 30,000 lines, or 700,000+ words in a single prompt, and showcased in-context learning by translating a low-resource language (Kalamang) from a grammar manual placed in the prompt.

What shipped

At announcement it shipped only as a private/limited preview to select developers and enterprise customers via AI Studio and Vertex AI, with the 1M-token window available at no cost during testing. Broader availability and paid pricing tiers (scaling from 128k up to 1M tokens) came later rather than on day one.

The verdict

Gemini 1.5 Pro reframed the competitive axis from raw benchmark scores to usable context length, and the 1M-token (later 2M) window proved genuinely useful for document-heavy, video, and large-codebase workloads that previously required brittle retrieval pipelines. The launch was preview-gated, so real adoption lagged the announcement, and independent testing later showed that effective reasoning quality can degrade across very long contexts even when retrieval ("needle in a haystack") looks near-perfect, so the full window is not uniformly high-quality. It was overshadowed in news cycles by OpenAI's same-week Sora reveal, but strategically it was important: it pushed long context into the mainstream and pressured rivals to match. For executives, the durable takeaway is that long context reduces but does not eliminate the need for good information design.

Why it matters

Long context can collapse some retrieval and chunking engineering into a single prompt, but leaders should pilot it on their own documents rather than trust the headline token count, since quality is uneven across the full window.

Sources

All tracked models