Curriculum conceptLast verified June 16, 2026

Embodied AI: when models leave the screen

Embodied AI puts models into things that sense and act in the physical world — robots, vehicles, devices — where mistakes have physical consequences and the reliability bar is far higher than for text.

Working and Deep are Pro — free during launch.

Embodied AI is the name for what happens when a model stops living inside a chat window and starts controlling something that senses and moves in the physical world: a warehouse robot, a self-driving vehicle, a robotic arm on an assembly line, a drone, a humanoid in a pilot deployment. The phrase that matters is leaves the screen. A text assistant that drafts a flawed paragraph wastes a few minutes; a system that misjudges a forklift's position, a pedestrian's path, or the grip on a heavy part can damage equipment, halt a line, or injure a person. That single difference reorders everything an executive needs to think about. The reliability bar is not a little higher than for office software; it is in a different regime, because the cost of a rare failure is paid in the physical world and cannot be undone with a click.

It helps to understand why this is genuinely hard, because the difficulty is the opposite of what intuition suggests. The tasks that look impressive to us — writing a legal summary, passing a professional exam, generating code — turned out to be relatively tractable for AI. The tasks a toddler masters — picking up an unfamiliar object without crushing or dropping it, walking across a cluttered room, recovering when something slips — have proven extraordinarily difficult to automate reliably. This inversion has a name in the field, Moravec's paradox, after the roboticist who observed it decades ago: the skills evolution spent the longest perfecting, perception and movement, are the ones hardest to recreate, while abstract reasoning, which is evolutionarily recent, came comparatively cheap. The practical lesson for a leader is to distrust any pitch that treats physical competence as a solved add-on to a capable language model. The language part may be solved; the hands are not.

The recent excitement comes from a real technical shift worth naming plainly. For most of robotics history, each robot was programmed for one narrow task in one controlled setting, and any change in the object, lighting, or layout broke it. Beginning around 2023, researchers started building what are called vision-language-action models — systems that take in what a camera sees and an instruction in plain words, and output the motor commands to act. Google's RT-2, published in July 2023, was an influential early example: by training a vision-language model on robot demonstrations, it could follow instructions involving objects it had never been trained on, something earlier robots could not do. In October 2024 a startup, Physical Intelligence, released a general robot model it called pi-zero in the same spirit. The ambition is to do for robots what large language models did for text: train one broadly capable model rather than hand-program every behavior. That ambition is real and the early results are promising, but it is an ambition in progress, not a shipped capability you can buy off the shelf for arbitrary tasks.

Two domains show how unevenly this plays out, and they are the ones most likely to touch a mid-market or enterprise organization. Self-driving is the most mature: Waymo reported that across more than 22 million driverless miles through mid-2024, its vehicles were involved in 84 percent fewer crashes with airbag deployment and 73 percent fewer injury-causing crashes than human drivers in the same cities. That is meaningful evidence that a narrow, heavily engineered embodied system can exceed human safety in its operating area — but note the qualifier: in its operating area, after enormous investment, in mapped cities, with remote human support standing by. Humanoid robots are the opposite end. Figure, Tesla, 1X, and Agility have all put units into pilots at manufacturers and warehouses, and the demonstrations are striking. But through 2025 these remained early pilots doing narrow, repetitive tasks under supervision, far from the autonomous general-purpose worker the marketing implies; Tesla, for instance, signaled far smaller real production than its public targets. The gap between an impressive demo video and a robot you can depend on for an unscripted shift is the whole story.

For an executive, the useful frame is that embodied AI inherits every limitation of the models covered elsewhere in this curriculum — the jagged frontier of uneven competence, the tendency to fail confidently, sensitivity to conditions outside its training — and then adds physical consequence, real-time constraints, and a much harder testing problem. You cannot fully test a physical system in a spreadsheet; the world supplies edge cases no simulation anticipated, which is why the field obsesses over the so-called sim-to-real gap, the way behavior that works in simulation degrades on real hardware. The questions worth holding onto are concrete and durable. What is the precise operating envelope — the conditions inside which the vendor claims reliability, and what happens at its edge? What is the human role: supervision, teleoperation fallback, or none? What is the evidence base, measured on the actual environment rather than a staged demo? And who carries liability when a physical system causes physical harm, a question office-software contracts never had to answer. These are not reasons to dismiss the technology; they are the difference between evaluating it as an executive and being sold the demo.

The honest summary is that embodied AI is advancing quickly in narrow, well-defined settings and remains far from general physical competence, and that the distance between those two states is where the commercial and safety reality lives. The trajectory is genuine: foundation-model techniques are bleeding from text into robotics, costs of capable hardware are falling, and serious capital is flowing in. But the timelines that get quoted publicly are projections, frequently from parties with an interest in optimism, and they should be read as such and separated from what is actually deployed and measured today. An accountable leader does not need to predict when humanoids become routine. They need to tell the difference between a system operating safely inside a proven envelope and a system whose envelope nobody has yet established, and to ask for the evidence either way.

Citations

Where this comes from.

Related concepts

Embodied AI: when models leave the screen

Where this comes from.

Keep building.