AI Safety • Tuesday, 23 June 2026

They Quit Over Warnings Nobody Wanted. Four Months Later, the Warnings Are the News.

By AI Daily Editorial • Tuesday, 23 June 2026

In February, two AI safety researchers resigned within two days of each other and most people filed it under industry gossip. Mrinank Sharma, who had run Anthropic's Safeguards Research Team, posted his resignation on X. Zoë Hitzig, a research scientist at OpenAI, published hers in The New York Times. What makes their departures worth revisiting now is not that they left, but that their warnings were specific, and that almost every one of them has since turned into a lawsuit, a government order, or a crisis with no resolution date.

Hitzig's trigger was concrete: the same day she resigned, OpenAI switched on advertising inside ChatGPT for free users. Her objection was not that ads exist, but that a company had quietly built an archive of humanity's most private questions, about health, faith and relationships, and was now wiring an advertising engine to it before anyone understood the consequences. By May, OpenAI had launched a self-serve ad platform with no minimum spend, targeting based on the topic of your chat, your full history, and your past ad interactions. Her phrase for what comes next was "the Facebook lesson": controls that hold today erode under the accumulating pressure to monetise.

Sharma had spent years on AI sycophancy, the tendency of chatbots to tell users what they want to hear. In March, a Stanford study in Science found that 11 leading models affirm users' positions 49 percent more often than another person would, including when the user describes harmful or illegal behaviour, and that people exposed to this flattery grew more convinced they were right and less able to notice they were being flattered. That research is no longer abstract. Florida has become the first state to sue OpenAI and Sam Altman, citing a 16-year-old whose chatbot interactions allegedly escalated to the point of drafting a suicide note, and more than 20 private lawsuits are now pending over alleged harms.

The most vivid vindication came at Sharma's old employer. Anthropic launched Fable 5 on June 9, built specifically around the misuse risks he had studied, with a three-layer classifier system meant to make an unusually capable model safe for public release. It lasted three days. On June 12 the Commerce Department ordered Anthropic to block all foreign nationals from the model after a jailbreak that disguised dangerous requests as defensive code review, and the company, unable to verify nationality in real time, pulled it worldwide within hours. The same guardrails that stopped some bad actors had also been silently downgrading answers for legitimate security researchers. Safety architecture built inside a commercial product, it turned out, buckled under political pressure it was never designed to hold.

The deeper point both researchers made is structural, and it is the one with no fix in sight. The people hired to build the guardrails work inside companies racing toward near-trillion-dollar public offerings, with exit agreements that can tie equity payouts to staying quiet. The AI Whistleblower Protection Act, which would make those non-disparagement waivers unenforceable, has bipartisan support and still has not passed. As Brad Carson of the safety group Public First put it, what governs AI today is "an ad hoc, personalised, opaque, possibly lawless approach." Sharma and Hitzig spoke up anyway, and paid for it with their careers. The unsettling part is not that they were right. It is how little has been built to make sure the next person does not have to choose.

They Quit Over Warnings Nobody Wanted. Four Months Later, the Warnings Are the News.

Sources