The AI model race has had a clear shape for the past three years: bigger models, more parameters, higher benchmark scores. DeepSeek's new V4, released in preview this week, suggests the race is changing shape. The Hangzhou startup isn't claiming to have built the most powerful model in the world. It's claiming to have built one of the most efficient, and it's making that argument with a technical report detailed enough to be taken seriously.
V4 comes in two flavours: V4-Pro, a Mixture-of-Experts model with 1.6 trillion total parameters but only 49 billion activated at any given moment, and V4-Flash, a smaller variant with 284 billion total parameters and 13 billion activated. Both support a context window of one million tokens, putting them on par with Google's Gemini in terms of how much material they can process in a single pass. That context length, and what DeepSeek does to make it affordable, is the actual story here.
Long-context AI has an uncomfortable economics problem. As context grows, the cost of attention, which requires every new token to reference what came before, grows with it. Conventional approaches become prohibitively expensive at the million-token scale, which is why most deployed systems either truncate inputs or charge a significant premium for extended contexts. DeepSeek's V4 attacks this through a hybrid attention design combining what the company calls Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA). CSA compresses groups of key-value entries and selects the most relevant blocks. HCA goes further, allowing dense attention over a dramatically shorter memory stream. The result, according to analysis by research firm SemiAnalysis, is a roughly 90% reduction in KV cache usage at the million-token scale, an improvement they describe as more impactful than Google's own TurboQuant paper published last month.
The benchmark picture is more mixed. On the Artificial Analysis Intelligence Index, V4-Pro scores 52, behind Moonshot AI's Kimi K2.6 at 54, and well behind OpenAI's GPT-5.5 at 60 and both Anthropic's Claude Opus and Google's Gemini 3.1 Pro at 57. Kyle Chan of the Brookings Institution describes the result as impressive for approaching state-of-the-art performance while maintaining efficiency, which is a generous framing: impressive for its cost, but not at the frontier. Market reaction this time was nothing like the shock of DeepSeek's R1 debut, which wiped hundreds of billions from US equities in a day. Nvidia's stock rose 4.3% on Friday, the opposite direction from the R1 panic.
What makes V4 worth watching isn't the benchmark position; it's what cheaper long-context inference unlocks. When it's affordable to feed an AI model a full codebase, a complete legal case record, or months of financial filings, the applications that make sense change. Research assistants that read primary literature rather than summaries. Coding tools that understand the whole project, not a snippet. Document review systems that don't require you to pre-select the relevant pages. The bottleneck has been cost, not capability in principle, and DeepSeek is explicitly targeting that bottleneck.
There's an unanswered question about the training side. DeepSeek mentioned Huawei's Ascend chips for inference, and Huawei confirmed that its Ascend 950-based systems support V4. But the company said nothing about what hardware was used to train V4, a notable omission given the US export controls on advanced Nvidia chips. Chris McGuire of the Council on Foreign Relations notes that V4 may have depended on restricted Nvidia Blackwell GPUs for training, which would put the compute-efficiency story in a different light: efficient inference built on expensive training that China can't readily replicate at scale. The gap between the US and China in AI capability, McGuire estimates, remains around seven months.
The Forbes analysis of V4 frames it as a signal that "the next AI race is about efficiency." That's probably right as a trend, if slightly premature as a verdict on this specific release. DeepSeek V4 is genuinely interesting engineering. It also arrives at a moment when OpenAI's GPT-5.5 and Anthropic's Opus 4.7 have both launched in rapid succession, making the competitive field unusually crowded. Whether efficiency-first open models can force closed-source providers to justify their pricing premium is the real tension to watch over the next twelve months.