Technology • April 8, 2026

NVIDIA Open-Sources the Full AI Stack, Not Just the Weights

By AI Daily Editorial • April 8, 2026

When AI labs release an open-weight model, they typically release the weights. NVIDIA's release of Nemotron 3 Super last week goes further: the company published the pre-training data, the post-training datasets and reinforcement learning environments, and the complete training recipe. The model, at 120 billion total parameters with 12 billion active per forward pass, is competitive in benchmark accuracy with GPT-OSS-120B and Qwen3.5-122B, while running 7.5 times faster than the Qwen model at long output lengths. The full stack is on HuggingFace.

The technical architecture is genuinely novel. Nemotron 3 Super is the first large model to combine three distinct efficiency techniques: LatentMoE, a new mixture-of-experts design that routes tokens through a compressed latent space before expert computation, recovering the saved bandwidth as additional expert capacity; hybrid Mamba-Attention, which uses linear-time Mamba sequence layers for most of the network and inserts standard attention layers as periodic anchors, dramatically reducing the memory overhead that limits long-context inference; and Multi-Token Prediction, which enables native speculative decoding without a separate draft model. The result is a model that supports one million token context and achieves 7.5x the inference throughput of its nearest comparable competitor on the benchmark configuration NVIDIA tested. The practical consequence is that running a frontier-class 120B model costs substantially less compute time than it did last quarter.

What distinguishes this from a routine model drop is the completeness of the release. Most frontier lab releases are weight drops. Meta releases LLaMA weights. Mistral releases weights. The training methodology, the data curation decisions, the reinforcement learning environments that shaped the model's behaviour: these stay inside the lab. NVIDIA released all of it. The Nemotron-Super-Post-Training-Data package includes the full suite of agentic RL environments used to train multi-step tool-using behaviour, the area where the model's post-training emphasis is concentrated.

Why would NVIDIA do this? The answer is in the business model. NVIDIA sells the hardware that runs AI. OpenAI, Anthropic, and Google sell the intelligence. These are different incentive structures. An open, capable model running on open infrastructure drives more inference workload, which runs on NVIDIA GPUs. NVIDIA has no reason to keep models proprietary; proprietary models running only on their creators' infrastructure are models that are not generating NVIDIA revenue. The company that makes the shovels benefits from as many people as possible digging. Making the ecosystem more capable by releasing both models and methodology costs NVIDIA nothing in competitive terms and potentially expands the market significantly.

This dynamic has been visible in NVIDIA's approach for some time, but Nemotron 3 Super is its clearest expression yet. The model is explicitly designed for agentic deployment: the post-training recipe substantially scaled the breadth of RL environments targeting multi-step tool use, software engineering, and terminal interaction. This is not a chat model. It is infrastructure for autonomous AI agents, openly available to anyone with hardware to run it.

The open frontier is now competitive with the closed frontier in capability terms, and in some deployment configurations superior in efficiency. The question this raises, one the frontier labs have not answered, is what the sustainable competitive advantage of proprietary models is when the open ecosystem has access not just to comparable weights but to the full methodology that produced them.