Anthropic released the first formal update on Project Glasswing this week, and the numbers are striking enough that the policy debate around the company's Mythos model now looks almost beside the point. After a single month of restricted access for roughly fifty partner organisations, Mythos has helped flag more than 10,000 high or critical severity vulnerabilities in widely used software. Cloudflare alone reported 2,000 bugs in its critical-path systems, with 400 of those classified as high or critical. Mozilla used the model to find and patch 271 vulnerabilities in Firefox 150. The company says several partners have seen their bug-finding rate increase by more than a factor of ten.
What is interesting is not the volume by itself. It is the way that volume rearranges the problem. For years the bottleneck in software security has been finding the bugs. Now, according to Anthropic, the bottleneck is everything that comes after: triage, verification, patch authoring, regression testing, deployment, and convincing end users to actually install the update. A model that can systematically surface tens of thousands of flaws does not eliminate that bottleneck. It exposes how narrow the human-side capacity has always been.
The accuracy figures help explain why this lands harder than past automated-scanner hype. Independent reviewers re-examined 1,752 of Mythos's critical flags and judged that 90.6 percent were legitimate vulnerabilities, with 62.4 percent confirmed as genuinely high or critical. Cloudflare went further and described the model's false positive rate as better than human testers. False positives are usually how security tools fail in practice: a noisy scanner with a 30 percent hit rate gets ignored by engineers, regardless of what its dashboard shows. A 90 percent hit rate forces a different conversation, because the findings cannot be triaged away as noise.
That is also where the new disclosure becomes uncomfortable for the rest of the industry. Anthropic has been careful to limit access, briefing only handpicked partners, and the White House blocked an expansion to a further 70 companies during the rollout. But the company is also explicit that models with similar capability will soon be more broadly available. Once any competent lab ships a Mythos-class system without the same gating, the asymmetry that Glasswing has created vanishes. Defenders inside Cloudflare-sized teams might cope with thousands of new high-severity tickets. Most software vendors cannot.
There is a second layer that the partners' numbers quietly imply. The model is not just finding more bugs. It is finding bugs in a different distribution than human researchers, including one that had been sitting in OpenBSD for 27 years. That suggests the existing corpus of known vulnerabilities reflects what humans were able to look for, not what was actually there. If Mythos and its successors are systematically surfacing a wider set of weaknesses, then a lot of software currently in production is more exposed than its security history suggests. Patch backlogs are essentially a record of the bugs we could afford to acknowledge.
The Glasswing update is the technical backdrop to the political theatre of the week. Trump's postponed executive order would have built a voluntary federal review pipeline aimed at exactly this category of model, with the National Security Agency as lead reviewer. Anthropic's own findings make a strong implicit case for that kind of structure, since the company is now openly saying the software industry is not equipped to absorb what the new tooling produces. The harder question is whether anyone, in or out of government, can build the patch-and-deploy machinery fast enough to keep up. For the moment, the strongest finding from Glasswing is not about Mythos. It is about how much of modern software was held together by attackers being roughly as slow as defenders.