Security • Thursday, 28 May 2026

The Toggle That Should Not Have Appeared: Mythos Heads for Public Release

By AI Daily Editorial • Thursday, 28 May 2026

A small switch appeared inside the public version of Claude Code this week. It was labelled "Mythos." Users who saw it before it vanished posted screenshots, and code strings around it referred to a model called claude-mythos-1-preview. The same identifier briefly turned up in Claude Security. None of this would matter, except that Mythos is the bug-hunting frontier model Anthropic spent the spring telling everyone was too dangerous to release. The toggle is the company quietly correcting the verb tense.

Cybernews, Yellow, Cryptobriefing, GovInfoSecurity and CXToday all picked up the same shift in language buried in Anthropic's latest Project Glasswing update. In April the company had written that it did "not plan to make Claude Mythos Preview generally available." On Friday it said, "in the near future, once we've developed the far stronger safeguards we need, we look forward to making Mythos-class models available through a general release," and added that it expects equivalent models to be "widely available in the next 6 to 12 months." That is the polite phrasing. The blunt phrasing is that the period during which Mythos-class capability is restricted to a curated list of about fifty trusted partners is now measured in months, not years.

What Mythos can do has stopped being abstract. Anthropic says its Glasswing partners have, in the past month alone, found more than 10,000 high- or critical-severity vulnerabilities across the most systemically important software in the world. Mozilla fixed 271 in a single version of Firefox, more than ten times what its team caught with the previous Claude. Cloudflare's CSO says the false-positive rate is now better than human testers'. Most uncomfortable of all, vulnerability researchers at a firm called Calif used Mythos Preview to build, in five days, the first working public macOS kernel memory corruption exploit on Apple's M5 chip, defeating a hardware-assisted memory safety system that took Apple five years to build. Their own description, quoted in CXToday: "We're about to learn how the best mitigation technology on Earth holds up during the first AI bugmageddon."

The interesting tension in the Glasswing update, which the press coverage is picking up to varying degrees, is that Anthropic is now arguing two slightly different things at once. The first is that defenders need this tool, and giving it to a wider set of vetted users will harden global infrastructure faster than holding it back. The second is that finding bugs is no longer the bottleneck: fixing them is. Cloudflare's contribution to the conversation, that the right way to use Mythos is not as a chat interface but as many small, narrowly-scoped agents whose findings are deduplicated afterwards, suggests the model has shifted from "research artefact" to "industrial scanner" inside the partners that have it. Mozilla, Anthropic notes, has had to ask the company to slow down because patch teams cannot keep pace with disclosure.

That has consequences the security trade press is taking seriously and the broader business press has not yet caught up to. Japan's finance ministry and its three largest banks are reportedly studying whether they would need to shut down financial systems in response to an AI-enabled attack, and are trying to obtain Mythos themselves to harden their defences first. The IMF has warned that emerging markets, with thinner defensive resources, are disproportionately exposed. GitHub has had to tighten its bug bounty rules because AI-generated submissions are drowning maintainers. The British AI Security Institute, allowed to test Mythos Preview directly, found it could "execute multi-stage attacks on vulnerable networks and discover and exploit vulnerabilities autonomously, tasks that would take human professionals days of work," while noting that it had not tested the model against a well-defended target.

The deepest open question is whether Anthropic's bet that defenders will, over time, beat attackers actually holds. The argument rests on a particular assumption: that the average time-to-patch will compress faster than the average time-to-exploit. The evidence so far is ambiguous. The average Mythos-found bug takes about two weeks to patch. The Calif team built a working M5 exploit in five days. Anthropic's revised timeline, which has slid from "eventual" to "near future" in five weeks, suggests the company believes the calculus has tipped, or at least that the window in which restriction is meaningful has closed. The toggle in Claude Code was probably an accident. The thing it accidentally showed, that the gate is opening, was not.

The Toggle That Should Not Have Appeared: Mythos Heads for Public Release

Sources