What's Possible NowPro — free during launch9 published

What AI can credibly do now. For enterprise decisions.

A capability frontier for executives, buyers, and operators. This is not a vendor launch feed; each entry tracks a capability shift that can affect strategy, operating model, product work, risk, or cost.

Bounded end-to-end software implementation is plausible, but not autonomous SDLC

The current software frontier is no longer autocomplete or isolated functions. With stronger agentic models and harnesses, AI can sometimes take a small product request, inspect an existing codebase, plan the work, edit multiple files, run checks, repair failures, and produce a reviewable result. Anthropic framed Claude Fable 5 as a Mythos-class model with strong agentic capability, and OpenAI has shown Codex expanding from software development into role-specific, cloud-running work agents. The honest enterprise label is frontier-only: useful for prototypes, small features, internal tools, migrations with strong tests, and well-scoped repairs, but not yet reliable autonomous ownership of a large software lifecycle. Humans still need to choose the problem, shape architecture, judge product fit, verify security, own incidents, and decide when not to ship. The capability is moving quickly enough that engineering operating models should adapt, but governance should not pretend the tool is a staff engineer. The frontier is supervised end-to-end execution, not unattended accountability.

Why it matters

Executives can plan for AI-assisted delivery teams where humans increasingly orchestrate and verify bounded work, but should not budget as if production ownership has been automated.

Limits

Large-scale systems, ambiguous requirements, cross-team dependencies, security-sensitive code, incident response, and long-term maintenance still require accountable human engineering leadership.

Coding agents can participate in debugging, test repair, and code review

The software-development frontier widened from "write code" to "participate in the quality loop." By late 2025, coding agents were no longer only generating snippets; they were being positioned and used for test repair, bug investigation, pull-request review, security scanning, codebase Q&A, and iterative changes across editor, terminal, and cloud workflows. OpenAI described Codex becoming generally available across the places developers work, while Anthropic reported rapid growth in agentic coding activity and substantial weekly usage by Claude Code users. For enterprise engineering leaders, this matters because review and debugging are often the bottleneck after initial implementation. The maturity label is productized, but uneven: agents can find obvious defects, explain unfamiliar code, repair failing tests, and suggest review comments; they still miss architectural consequences, subtle security issues, and business rules that are not encoded in tests or docs. The best use is as another reviewer, not the final authority, today in production.

Why it matters

AI can now help across the delivery loop, not just the blank-page coding moment, making test quality and review standards central to AI leverage.

Limits

AI review is not a control substitute. It can miss high-severity issues, overfit to tests, or generate noisy comments that waste senior attention.

Cloud coding agents can make repo-aware changes and propose PRs

Software AI crossed from assistant to delegated contributor when cloud coding agents could load a repository, inspect the codebase, work in a sandbox, edit multiple files, run tests, answer code questions, fix bugs, and propose pull requests. OpenAI Codex and Anthropic Claude Code are reference products for this shift. This does not mean the model owns production. It means a senior engineer can delegate bounded implementation or repair tasks and review a concrete diff instead of starting from a blank editor. For enterprise leaders, the operating model changes: planning quality, test coverage, repository hygiene, and review discipline become leverage points for AI output. The capability is now productized for teams willing to supervise it. It is strongest on scoped tasks with clear acceptance criteria and weaker on ambiguous product judgment, cross-service architecture, security-sensitive changes, and large migrations without strong tests. The measurable unit becomes reviewed change, not generated code volume.

Why it matters

Engineering organizations can redesign parts of delivery around delegation, review, and verification rather than only individual typing speed.

Limits

Agents still need clear task framing, trustworthy tests, human code review, and production accountability. Poorly specified tasks can produce plausible but wrong changes at higher speed.

Browser-use agents can complete bounded web tasks

Agents became more concrete when systems could use a browser directly: looking at pages, clicking, typing, scrolling, and handing control back to a person for sensitive steps. OpenAI Operator and the underlying Computer-Using Agent made the capability legible to buyers. The enterprise meaning is not "AI can run the company." It is that some bounded web tasks can be delegated when the target system is familiar, the steps are reversible or reviewable, and the user remains in control. Examples include filling routine forms, collecting structured information, checking account states, or operating internal tools with guardrails. This remains frontier-only for most business-critical workflows because web interfaces change, authentication flows are brittle, prompt injection is real, and error recovery can be expensive. The capability is worth tracking because it points toward AI using existing software without every system needing a custom API integration. That could matter most in long-tail operations work across functions.

Why it matters

Browser-use agents create a path to automating low-volume, cross-system workflows that are too awkward to integrate formally but too repetitive to ignore.

Limits

Reliability, security, auditability, credential handling, and prompt-injection exposure remain serious constraints. High-risk actions still need human confirmation and logging.

Multimodal assistants can interpret documents, screenshots, and meetings

AI stopped being mostly a text box when frontier assistants could reason across text, images, audio, and eventually video in one interaction. For enterprises, the important capability is practical multimodal interpretation: ask about a chart, screenshot, slide, product mockup, form, call recording, or workflow capture, then turn that input into an explanation, checklist, draft, or next action. GPT-4o made the shift highly visible with real-time audio, vision, and text; other model families followed similar paths. The capability is productized, but still not a replacement for specialist review. A model may correctly read the visible structure of a chart while missing data provenance, legal nuance, or a hidden spreadsheet formula. The operating opportunity is to bring AI closer to how work actually appears: messy artifacts, meetings, diagrams, interfaces, and documents rather than clean prompt text alone. This expands who can use AI and where it fits in daily work across roles.

Why it matters

Teams can use AI on the artifacts they already work with: screenshots, decks, charts, calls, demos, design reviews, and operational documents.

Limits

Multimodal interpretation is still vulnerable to visual ambiguity, hidden context, poor source quality, and overconfident explanations of charts or screens it only partially understands.

Enterprise copilots can work over permissioned company data

The enterprise capability frontier moved when general AI assistants started operating inside existing work suites with tenant isolation, permission inheritance, and contractual data protections. That changed the buyer question from "can a model answer generic questions?" to "can an assistant help employees reason over email, documents, meetings, files, and workflows without exporting company data into an uncontrolled tool?" Microsoft 365 Copilot is the clearest reference point, but the broader capability now exists across major enterprise platforms. This is productized, not solved. It can improve search, synthesis, drafting, meeting follow-up, and lightweight workflow creation, but it also makes information architecture and access hygiene more important. If too many users can see too much, a copilot can surface that overexposure faster. The business value depends less on model magic than on clean permissions, useful connectors, change management, and security review. In practice, readiness is often an identity-and-data project first, before broad deployment.

Why it matters

AI adoption can move from isolated pilots into normal work systems, making governance, permissions, and data architecture board-level operating concerns.

Limits

These systems inherit messy permissions and can create new prompt-injection or data-exposure risks. They also need process redesign; adding a copilot to broken workflows rarely fixes them.

Long-context models can analyze full business documents

AI crossed an important enterprise threshold when leading models could ingest hundreds of pages, then later millions of tokens, in a single working context. That made practical workflows possible that were awkward with short chat windows: contract review, board-pack summarization, policy comparison, diligence packet triage, transcript synthesis, and codebase orientation. The capability is not the same as reliable legal or financial judgment. It means the model can keep far more source material visible while answering, reducing the need to pre-chunk every task into tiny prompts. For buyers and operators, this is a productized capability in frontier and mainstream business tools, but still process-dependent. Teams need source-grounding, citation expectations, permission controls, and review rules because large context expands what the model can see; it does not guarantee it will use every relevant detail correctly. The practical frontier is now workflow design around that larger working memory, including repeatable review checkpoints for evidence.

Why it matters

Large document analysis is now realistic for repeated executive workflows such as diligence, policy review, board prep, and customer/account research.

Limits

Long context can miss or misprioritize details, especially when instructions conflict or documents are noisy. Sensitive documents also require strict data-boundary controls.

A chat interface makes general business drafting and summarization usable

The first broadly useful enterprise capability was not autonomous work; it was conversational drafting, summarization, rewriting, and first-pass analysis. ChatGPT made a large language model usable by non-specialists through ordinary dialogue: ask for a summary, rewrite a note, compare options, turn rough bullets into a memo, or pressure-test a message. That shifted AI from a specialist model or embedded feature into a general office tool. For business operators, the capability is now mature enough for low-risk internal work where a human reviews the output and the source material is not sensitive. It is also cheap enough that the question is less "can AI do this?" and more "where does review cost exceed drafting cost?" The durable limitation is truthfulness: fluent output is not evidence that the model checked facts, understood policy, or used current information. This is why the winning workflow is supervised acceleration, not unsupervised delegation, especially for executive-facing work.

Why it matters

Executives can treat AI drafting and summarization as a standard productivity layer for internal communication, meeting prep, research triage, and first-pass analysis.

Limits

Outputs still need human review, especially for facts, legal/compliance claims, customer promises, and any work requiring current or proprietary data.

Code completion becomes a mainstream developer tool

AI became credible as a day-to-day developer productivity layer when code completion moved from novelty to editor-native workflow. The important shift was not that AI could write software independently; it was that a model could inspect the surrounding code, infer intent, and suggest whole lines, functions, tests, or API usage quickly enough to stay inside a developer flow. For enterprise engineering leaders, this made AI useful without changing delivery governance: humans still owned design, review, security, and deployment, while the model reduced local typing, lookup, and boilerplate cost. By 2026, this capability is no longer frontier. Multiple commercial tools and lower-cost models can handle routine completions and small snippets, though quality still depends heavily on local context, language, framework familiarity, and developer review. The executive takeaway is that autocomplete is now table stakes for engineering teams, not a transformation program by itself. It is the baseline layer on which more ambitious repo-aware and agentic software workflows now build.

Why it matters

Baseline developer productivity expectations changed: teams can reasonably assume AI-assisted completion for routine code, tests, and API exploration.

Limits

Autocomplete does not understand product intent, system architecture, security posture, or whether a suggested change is maintainable. It can also reinforce local mistakes when the surrounding code is weak.