The Safety Race Is Still a Race

By Peter Harrison • April 16, 2026

Last week, Anthropic announced that Claude Mythos could find high-severity vulnerabilities in every major operating system and web browser. This week, OpenAI released GPT-5.4-Cyber to a vetted group of security researchers, explicitly positioned as a response to Mythos. The interval between these two events was approximately seven days. Whatever else you want to say about the AI safety movement, it apparently cannot slow down a product cycle to more than a week.

I want to be fair to both companies here, because I think the usual critique misses the point. The people at Anthropic and OpenAI who work on safety are not cynics. The vetting programmes for these cyber models are real. The controlled access mechanisms are genuine attempts to prevent the obvious bad outcomes. Anthropic did not release Mythos to the general public and sit back to see what happened. Neither is OpenAI. The safeguards exist and they are not theatre.

But here is the uncomfortable thing: a safety-conscious company is still a company. And when a direct competitor releases a powerful model into a market you both want to be in, the pressure to respond does not consult your safety committee first. It shows up in the board meeting. It shows up in the questions from investors. It shows up in the conversations with enterprise customers who want to know why they should wait for you when the other company is already shipping.

The alignment researchers will tell you that Anthropic and OpenAI are aligned. I believe them. The models are trained to behave well. The deployment decisions are made with genuine care. None of that changes the fact that the structural incentive is to move quickly, and "quickly" in this case means deploying highly capable offensive cybersecurity AI to vetting programmes that, by design, expand over time. Today it is a hundred vetted researchers. Six months from now it is a thousand. A year from now the vetting programme has become a commercial product with enterprise pricing.

This is not a prediction about bad faith. It is a prediction about how markets work. Once you have built the capability and established that controlled deployment is acceptable, the pressure is always toward wider deployment, not narrower. The question is not whether GPT-5.4-Cyber will eventually be available to anyone with an enterprise account. The question is when. And the answer is: sooner because Anthropic and OpenAI are racing.

The alignment researchers have an answer for this, and I have heard it many times. They say: the alternative is worse. If we do not build these tools and deploy them to defenders, the attackers will build them anyway, and the defenders will be behind. This is the logic that justified every step in the nuclear arms race, and I do not say that to be melodramatic. I say it because the logic is identical. Unilateral restraint does not make the threat go away; it just means you face the threat unarmed. Therefore: race.

I find this argument genuinely difficult to dismiss. I think it is mostly right on the object level. If you are a security team at a hospital trying to defend patient records against nation-state attackers who already have capable AI, you need tools. Anthropic and OpenAI giving you those tools is, on net, good. I am not arguing that the models should not exist.

What I am arguing is that the framing of this as a "safety" initiative obscures something important. Deploying models that can find every critical vulnerability in production software is not a safety decision that happens to have commercial implications. It is a commercial decision that happens to be defensible on safety grounds. The distinction matters because it determines whose logic is actually running the show.

There is a version of AI deployment that proceeds from the question: where can AI do the most good, and how do we get there while managing the risks? That version of the story is what Anthropic's Project Glasswing aspires to be. And then there is the version that actually happens: a competitor ships first, so you ship a week later. Both stories exist simultaneously. The question worth sitting with is which one is doing the driving.

I am not suggesting Anthropic or OpenAI are being dishonest. I think the genuine belief is that defensive deployment beats the alternative. What I am noting is that this belief is convenient. It aligns perfectly with what the market wants and what the investors expect. When your sincere values and your commercial interests point in exactly the same direction, it is worth at least asking whether the values are doing any independent work, or whether they are simply providing the ethical vocabulary for decisions that were going to be made anyway.

The AI safety movement has always had this problem. Alignment ensures the AI does not kill you in your sleep. It does not slow down the race. It does not change the incentive structures that produce races. It does not prevent two safety-conscious organisations from shipping capable cyber AI within seven days of each other because the market rewarded the first mover. Safety is necessary. It is not sufficient. It was never going to be sufficient. And nothing in the last week has made me think otherwise.