NVIDIA has launched Nemotron 3 Super, a 120-billion-parameter open model designed to improve the performance and cost-effectiveness of complex autonomous agent systems. Addressing challenges with context explosion and intensive computational demands, the new model achieves up to five times higher throughput than its predecessor, Nemotron Super, while maintaining or improving accuracy. The architecture combines transformer and Mamba layers with a novel mixture-of-experts approach, activating only 12 billion of its total parameters during inference. This innovation allows agents to retain a 1-million-token context window, preventing goal drift during extended tasks, and powers NVIDIA’s AI-Q research agent to the top of the DeepResearch Bench leaderboards; Nemotron 3 Super has set new standards on Artificial Analysis for efficiency and openness. Developers can now deploy and customize the model with open weights on various platforms, from workstations to cloud environments.
Billion Parameter Nemotron 3 Super Model for Agentic AI
With 120 billion parameters, the newly released Nemotron 3 Super model addresses critical limitations hindering the development of truly autonomous agentic AI systems, specifically the escalating costs associated with lengthy reasoning processes and the problem of context explosion. Launched by NVIDIA, the model is characterized by 12 billion active parameters designed to power complex AI agents at scale, offering a significant leap in efficiency for applications demanding extensive memory and computational resources. Several companies are already integrating the model, including Perplexity, which offers users access to Nemotron 3 Super for search, and CodeRabbit, Factory, and Greptile, who are incorporating it into their software development agents.
The challenge of context explosion arises from multi-agent workflows generating up to fifteen times more tokens than standard chat interactions, as each step requires resending complete histories. Nemotron 3 Super tackles this with a 1-million-token context window, enabling agents to maintain full workflow state and prevent goal drift. Beyond memory capacity, the model also mitigates the computational burden imposed by complex reasoning at every step, utilizing a hybrid architecture combining Mamba and transformer layers to deliver up to five times higher throughput and two times greater accuracy compared to its predecessor. Only 12 billion of its 120 billion parameters are active at inference, highlighting a key efficiency gain. The model’s capabilities extend to powering the NVIDIA AI-Q research agent, which now leads the DeepResearch Bench and DeepResearch Bench II leaderboards, benchmarks assessing an AI’s ability to conduct thorough, multi-step research.
Hybrid Architecture Achieves 5x Throughput with Mamba & MoE Layers
The current push for more capable artificial intelligence agents is running into practical limitations. Simply scaling up model size isn’t enough to overcome issues with context length and computational cost. Existing autonomous agent workflows struggle with context explosion, where the sheer volume of data needed for each interaction, including tool outputs and reasoning steps, increases expenses and risks the agent losing focus on its initial goal. Simultaneously, the need for complex reasoning at every stage of a task creates a computational burden that slows down applications and drives up costs. NVIDIA’s recently launched Nemotron 3 Super addresses these challenges through a novel hybrid architecture. This design combines the strengths of transformer and Mamba layers, resulting in up to five times higher throughput compared to the previous Nemotron Super model. Mamba layers, specifically, deliver four times greater memory and compute efficiency, while transformer layers continue to handle advanced reasoning tasks.
Beyond parameter efficiency, Nemotron 3 Super employs multi-token prediction, allowing it to predict multiple future words simultaneously and achieving three times faster inference speeds. Running on the NVIDIA Blackwell platform with NVFP4 precision, the model minimizes memory requirements and accelerates inference up to four times faster than FP8 on NVIDIA Hopper, all without sacrificing accuracy.
Nemotron 3 Super has high-accuracy tool calling that ensures autonomous agents reliably navigate massive function libraries to prevent execution errors in high-stakes environments, like autonomous security orchestration in cybersecurity.
NVIDIA
Nemotron 3 Super Tops Benchmarks for Efficiency and Reasoning
Several companies are rapidly integrating the newly released Nemotron 3 Super into their AI systems, indicating a swift move toward more capable autonomous agents. Perplexity is offering its users access to the 120-billion-parameter model for search functionality, incorporating it as one of twenty orchestrated models within its Computer application. Simultaneously, developers specializing in AI-powered software agents, including CodeRabbit, Factory, and Greptile, are embedding Nemotron 3 Super alongside their proprietary models to enhance accuracy while reducing computational costs. Life sciences organizations, such as Edison Scientific and Lila Sciences, are also poised to leverage the model for in-depth literature reviews, data analysis, and molecular understanding. A key challenge in scaling multi-agent AI systems is managing the exponential growth of contextual data; these workflows can generate up to fifteen times more tokens than standard chatbot interactions, increasing costs and risking agents losing focus on their initial objectives.
Open Weights & Broad Platform Support Accelerate Agent Deployment
The increasing sophistication of autonomous agents demands models capable of handling increasingly complex tasks, but practical deployment has been hampered by computational costs and context limitations. NVIDIA’s recent release of Nemotron 3 Super with open weights is directly addressing these hurdles. Unlike previous iterations, this 120-billion-parameter model is designed for scalability, enabling developers to customize and deploy it across a diverse range of hardware, from workstations to cloud infrastructure. This broad accessibility is not merely a convenience, but a critical step toward wider adoption of agentic AI systems. Industry leaders including Amdocs, Palantir, and Siemens are also customizing the model to automate processes in sectors ranging from telecommunications to semiconductor manufacturing. A key feature enabling this is the model’s 1-million-token context window, which prevents goal drift by allowing agents to retain full workflow state in memory. NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license, and is publishing the complete methodology, including over 10 trillion tokens of pre- and post-training datasets, 15 training environments for reinforcement learning and evaluation recipes. This commitment to open science allows researchers to fine-tune the model or build entirely new systems using the NVIDIA NeMo platform.
