Models • April 25, 2026

GPT-5.5 Bets on Autonomy, Not Raw Intelligence

By AI Daily Editorial • April 25, 2026

OpenAI released GPT-5.5 on April 23, and the company's pitch is notably different from the usual benchmark parade. Rather than leading with test scores, co-founder Greg Brockman put it plainly: "What is really special about this model is how much more it can do with less guidance." That framing is deliberate. GPT-5.5 is OpenAI's first model where the central selling point is autonomous execution rather than raw reasoning power, and the benchmark data actually supports that unusual claim.

On Terminal-Bench 2.0, a test designed to measure how well a model completes multi-step agentic tasks in real software environments, GPT-5.5 scores 82.7 percent, ahead of Anthropic's Claude Opus 4.7 at 69.4 percent. But flip to Humanity's Last Exam, which tests abstract reasoning without tool assistance, and the picture reverses: GPT-5.5 Pro scores 43.1 percent, behind Opus 4.7 at 46.9 percent. OpenAI is not building the most brilliant model in the room; it is building the one most willing to get things done without being told exactly how.

The mechanism behind this is a "Thinking" mode that lets the model validate its own reasoning before committing to a response, combined with hardware-level changes on NVIDIA GB200 and GB300 systems that deliver 20 percent faster token generation versus GPT-5.4. The result is a model that is more decisive and faster, even as it carries a roughly double price tag compared to its predecessor.

The release also marks a meaningful step toward what OpenAI calls its "super app" vision: a unified service that brings together ChatGPT, the Codex coding agent, and an AI-powered browser under a single interface for enterprise customers. GPT-5.5 is the intelligence layer that makes that combination viable. Without a model capable of navigating loosely specified, multi-system tasks, a super app is just a complicated dashboard.

There are real tensions in this launch. GPT-5.5 is not yet available via API, which frustrates developers who build on top of OpenAI's models rather than using the ChatGPT interface. The model also carries a "High" risk classification in both biological and cybersecurity domains, which prompted OpenAI to introduce a parallel "cyber-permissive" access tier for verified security professionals. That two-track approach is novel and worth watching: it acknowledges the model's capabilities more honestly than most safety disclosures do, while carving out controlled access for the people who most need to probe those limits.

Whether the autonomy bet pays off depends on what enterprises actually need. If the next wave of AI adoption is about deploying agents that can run processes end-to-end with minimal supervision, GPT-5.5 is well-positioned. If it turns out that businesses still want a model they can interrogate, that reasons carefully and explains itself, Anthropic's stronger showing on abstract benchmarks may matter more than it currently appears. The honest answer is that both things are probably true in different contexts, and the more interesting race is now about which company figures out where the line sits.

GPT-5.5 Bets on Autonomy, Not Raw Intelligence

Sources