Models • Monday, March 16, 2026

Anthropic Ships Its Most Capable Model Yet — and Doubles Down on Agents

By AI Daily Editorial • Monday, March 16, 2026

It arrived in February with notably less noise than Anthropic's legal battles, but Claude Opus 4.6 may matter more to the company's near-term future than any courtroom ruling. Released across cloud platforms and now broadly deployed, Opus 4.6 is Anthropic's clearest statement yet about where it thinks AI is actually going — and it isn't waiting for the Pentagon fight to be settled before making it.

The headline capability is what Anthropic calls "agent teams." Opus 4.6 can break large tasks into independent subtasks and run them in parallel, using subagents and tools concurrently rather than sequentially. That's a meaningful architectural shift from how earlier models — including Opus 4.5 — handled complexity. Working through steps one at a time imposes real costs on long, multi-part tasks: speed, error compounding, and the inability to recover from a blocked branch without restarting the whole workflow. Parallelising changes the economics of ambitious agentic work considerably.

The model also ships with a 1 million token context window in beta — a first for Anthropic's Opus tier. In practical terms, that means Opus 4.6 can hold an entire large codebase, a lengthy legal case file, or a multi-volume research corpus in working memory simultaneously. Combined with the agent teams architecture, you start to see the use cases Anthropic is targeting: not chat, but sustained, autonomous work on genuinely large problems.

On benchmarks, Anthropic reports the highest score on Terminal-Bench 2.0 (an agentic coding evaluation), the top position on Humanity's Last Exam (a broad multidisciplinary reasoning test), and a BigLaw Bench score of 90.2%. That last number is particularly notable — legal reasoning is a domain where accuracy and precision matter enormously, and a 90%+ score with 40% perfect completions suggests the model is doing something qualitatively different from its predecessors on structured professional tasks.

The commercial framing is notable for what it doesn't emphasise. Anthropic's launch materials for Opus 4.6 lean hard on usefulness over hours-long autonomous tasks, rather than leaderboard position for its own sake. In a market where every frontier lab publishes benchmark scores weekly, the differentiation play seems to be reliability and task completion depth — can it actually finish the job unsupervised? — rather than raw capability peaks.

Opus 4.6 is available on Claude Pro, Max, Team, and Enterprise plans, and through Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing starts at $5 per million input tokens and $25 per million output tokens, with up to 90% discounts for cached prompts and 50% for batch processing.

The timing carries a pointed irony worth noting. Anthropic's most powerful model to date — explicitly designed to operate autonomously across complex tasks, to run tools and subagents in parallel, to sustain work on large problems without constant human supervision — dropped in the same weeks that the Pentagon argued the company was dangerously overcautious about autonomous AI. Anthropic appears to believe that building highly capable agents and insisting on safety guardrails for those agents aren't in conflict. Whether that position is commercially sustainable in the current political climate is a question the courts are now being asked to answer.

Anthropic Ships Its Most Capable Model Yet — and Doubles Down on Agents

Sources