← Front Page
AI Daily
Legal • March 19, 2026

The Dictionary Sues OpenAI — and the Copyright Map Is More Contradictory Than Ever

By AI Daily Editorial • March 19, 2026

Merriam-Webster and Encyclopaedia Britannica filed suit against OpenAI last week, alleging that nearly 100,000 of their articles were used without permission to train its language models, and that outputs from those models sometimes reproduce their content verbatim. The lawsuit is not especially surprising — reference publishers are exactly the kind of carefully curated, authoritative text that makes ideal training data — but its timing lands in the middle of a copyright landscape that has become almost impossible to read as a whole. Depending on which courthouse you look at, AI training on copyrighted data is either legal, illegal, or simply unresolved, and the answers are different for the inputs than for the outputs.

The output side got a definitive-sounding ruling late last year, when a federal appeals court confirmed that AI-generated art cannot be copyrighted under US law because it lacks a human creator. The Supreme Court declined to take up the question in March, letting that ruling stand. The practical consequence: if you generate an image or a piece of text entirely with AI, you own nothing. Competitors can copy it freely. This has already begun to reshape how companies think about AI-assisted creative work — human involvement in the final product is now a legal asset, not just an aesthetic preference.

The input side is murkier. Judge William Alsup ruled earlier this year that Anthropic's training on copyrighted content was legal — a significant win for the industry's "fair use" argument. But that ruling sits uneasily alongside the wave of active litigation: music publishers are suing Anthropic for $3 billion over lyrics, YouTubers have sued Snap, and a coalition of authors including John Carreyrou have named six AI companies as defendants. The Merriam-Webster/Britannica suit adds two of the most recognisable reference brands to the plaintiff list. Each individual lawsuit is argued on its specific facts, and the outcomes will vary, but together they represent a sustained legal challenge to the premise that scraping the web to train a model is categorically protected by fair use.

What makes the copyright wars particularly hard to resolve is that the two sides are not simply arguing about money — they are arguing about what transformation means. The AI companies' position is that training is transformative: the model does not store or reproduce the text, it learns statistical patterns from it, and outputs are new creations. Publishers' position is that the transformation argument is circular: the entire value of the model is built from their investment in creating high-quality text, and calling that training "transformative" is a lawyer's trick for legalising a form of commercial free-riding.

For reference publishers specifically, the stakes are acute. Merriam-Webster and Britannica are in the business of producing definitions and encyclopaedic content that AI models are now providing directly, in response to user queries, without attribution or payment. The harm is not hypothetical — it is measurable in declining traffic to their sites. The lawsuit is partly a legal argument and partly a statement of survival: if courts rule that training on reference content is fair use and outputs are transformative, the traditional reference publishing business model is essentially over.

How these cases resolve will shape the next phase of AI development more than almost any technical decision the labs make. A ruling that broad training is not fair use would force the industry to either license content at scale — as some companies have already begun doing with deals at the New York Times, News Corp, and others — or shift toward synthetic data and model-to-model distillation to reduce reliance on human-generated text. A ruling that it is fair use removes the legal pressure, but not the commercial pressure: publishers will continue to find ways to make their content less accessible to scrapers, and the legal clarity won't restore the audience AI has already diverted.

Sources