Model Watch · Reviewed

OpenAI GPT-4

Announced Mar 14, 2023Reviewed Jun 23, 2026

What they claimed

OpenAI billed GPT-4 as its most advanced system, a large multimodal model accepting both image and text input and emitting text, exhibiting "human-level performance on various professional and academic benchmarks." The headline claim was a simulated bar exam in roughly the top 10% of test-takers, versus the bottom 10% for GPT-3.5, alongside strong scores on the SAT, GRE, and AP exams. OpenAI said six months of iterative alignment produced its best-ever results on factuality, steerability, and staying within guardrails, with roughly 40% higher factuality scores than GPT-3.5 on internal evals, and stressed GPT-4 was more reliable and creative on complex, nuanced tasks even where the difference was subtle in casual chat.

What shipped

At launch on March 14, 2023, GPT-4 was available as text-only to paying ChatGPT Plus subscribers (with a message cap) and to developers via an API waitlist; API pricing was about $0.03 per 1K prompt tokens and $0.06 per 1K completion tokens, with 8K- and 32K-token context versions. The promoted image-input capability was not generally available at launch and only reached the public months later (GPT-4V, around September 2023). A GPT-4 Technical Report and System Card accompanied the release.

The verdict

GPT-4 was a genuine capability step, more reliable, better at reasoning, and far more useful for real professional work than GPT-3.5, and it became the default "serious" model underpinning a huge wave of enterprise deployment, including Microsoft Copilot. But several launch claims deserve an executive's skepticism in hindsight. The bar-exam and standardized-test results were later challenged as overstated and partly attributable to benchmark contamination and favorable scoring methodology, so the "top 10%" framing oversold practical competence. The multimodal positioning outran reality, since image input shipped to the public roughly half a year after the announcement. Most notably, the Technical Report disclosed essentially nothing about model size, architecture, training data, or compute, a sharp break from the openness of the GPT-3 era that drew sustained criticism and confirmed OpenAI's shift to a closed, competitive posture. And the core limitation never went away: GPT-4 still hallucinated and made reasoning errors, requiring human oversight in any high-stakes use.

Why it matters

GPT-4 is the model that made generative AI good enough for real business workflows and set the capability bar competitors chased for over a year, so it is the baseline against which most enterprise AI decisions of 2023-2024 were made. It is also the clearest signal that frontier AI had become closed and opaque, meaning buyers must now evaluate these systems empirically rather than trust vendor benchmark claims.

Sources

All tracked models