"Deliberately Weaker" Is Not a Safety Strategy

By Peter Harrison • April 19, 2026

Anthropic shipped Opus 4.7 this week with deliberately reduced cyber capabilities. Bloomberg's headline frames this as Anthropic releasing a model with "weaker cyber skills than Mythos," a startup most people had not heard of until it became the benchmark against which the two largest AI labs in the world are measuring their decisions. The framing is almost universally positive: Anthropic is being responsible, pulling back from a dangerous capability, choosing safety over competition. I want to examine that claim, because I think it is doing more reassurance work than it is doing actual safety work, and because the company's recent history makes the "principled restraint" narrative harder to accept uncritically.

Here is the problem with "deliberately weaker" as a concept: it assumes that capability reduction in one benchmark corresponds to meaningful reduction in real-world harm potential. That assumption does not hold up. Security researchers, penetration testers, and people with less constructive intentions do not evaluate AI tools by specialist benchmark scores. They use AI tools because of general reasoning, code generation, and the ability to approach novel problems creatively. Opus 4.7 retains strong capabilities in all of those areas. The things that make AI genuinely useful in a hands-on attack context are general intelligence and code fluency, not a high score on a cyber-specific test. Reducing the latter while leaving the former intact is less meaningful than it sounds.

This is not speculation about future risks. The security community has been watching AI-assisted offensive capability grow for two years, and the bottleneck has rarely been specialist training. It has been prompt design, tool access, and what you point the model at. An operator with a capable general model, some patience, and API access to something worth compromising is not materially hindered by Anthropic's decision to reduce a benchmark score.

So what is the decision actually about? I can see three genuine drivers, and none of them are quite the same as safety.

The most straightforward is liability. If Anthropic's model is used in a documented cyber attack, and that model had been publicly marketed as having strong offensive cyber capabilities, the legal exposure would be substantial. Reducing the advertised capability is pre-emptive legal insulation. This is rational corporate behaviour. It is not the same as risk reduction, and conflating the two is worth resisting.

The second is the Pentagon context, and this is where the story gets more complicated than the "principled stand" framing allows. Anthropic spent months fighting a supply chain risk designation from the Department of War. But the dispute did not begin with Anthropic refusing military contracts. It began after Claude, deployed via Palantir under an existing Pentagon agreement, was used in the January 2026 operation to capture Venezuelan president Nicolás Maduro. That operation is the subject of ongoing legal challenge and is widely described as having targeted individuals in ways that may have violated US law. Anthropic's response, publicly at least, was to say it had not discussed specific operations with the Pentagon and could not permit its technology to be used for autonomous weapons or domestic surveillance. What it did not say was that it had already been used for something arguably outside those limits, through a contractor relationship it apparently did not have full visibility into. Shipping a flagship model with reduced offensive cyber capabilities is consistent with the company's stated position. But the Venezuela situation demonstrates the gap between stated policy and actual deployment, a gap that a benchmark adjustment does not close.

The third, and the one I take most seriously, is genuine discomfort. Anthropic's founders left OpenAI over disagreements about safety culture. The people making this decision are not cynics. There is real ethical concern here about what strong cyber AI means in practice, and I think that concern is well-founded even if the mechanism for addressing it is imprecise. The instinct to pull back is correct. The framing of the pullback as a completed safety measure is where the problem lives.

Compare Anthropic's approach to OpenAI's: a dedicated cyber model released in limited access to vetted partners, in explicit competition with Mythos. OpenAI is not framing this as a safety decision. It is framing it as capability development with controlled rollout. That framing is at least more honest about what is actually happening. The capability exists. Someone is going to build it. The question is whether you are at the table shaping how it gets used, or whether you are performing restraint while the ecosystem develops around you.

The deeper problem is structural, and this is where I think we need a harder conversation. Both approaches involve unilateral decisions by private companies about what AI should and should not be capable of, with no shared framework, no external verification, and no way for anyone outside the company to audit the claim. When Anthropic says Opus 4.7 has "weaker cyber skills," we are being asked to take the company's word for it, based on internal benchmark comparisons to a competitor's model. When OpenAI releases a cyber model to "vetted partners," we are trusting OpenAI's vetting process. Mythos, apparently ahead of both, is presumably making its own decisions about access and responsibility.

We are at the stage where the governance of AI cyber capability is entirely voluntary and entirely opaque. Every lab is making decisions that have significant security implications for the rest of the world, based on their own values, their own legal exposure, and their own competitive position. The "deliberately weaker" framing at least surfaces the conversation. But surfacing the conversation is not the same as having it. What would an actual framework for AI cyber capability governance look like? Who would verify it? How would you handle a company like Mythos that does not have Anthropic's reputational investment in being seen as responsible?

I am not arguing that Anthropic made the wrong call. I am arguing that calling it a safety strategy sets expectations it cannot meet, and that the harder work, building governance structures that go beyond individual company decisions, is not something any lab can do alone. Deliberately weaker is a starting position, not an answer.