Open AI agents get a stronger engine, but deployment is the real test
Manual Codex image generationđˇ AI-generated / Tech&Space
- â The model uses a hybrid Mamba-Attention MoE architecture
- â NVIDIA emphasizes throughput and long context
- â Open agent models pressure closed systems
NVIDIA has dropped Nemotron 3 Super into an open-source landscape that's getting crowded fast. The 120 billion parameter model sits between the 30B Nano and a promised 500B Ultra arriving in 2026, giving developers a mid-weight option that trades sheer scale for architectural cleverness.
Its hybrid design combines Mamba state-space layers with traditional Transformer attention and a Mixture of Experts routing systemâa configuration that theoretically cuts compute waste by activating only relevant parameter subsets per token.
The timing matters. Open-source challengers like DeepSeek, Mistral, and Meta's Llama family have forced proprietary labs to justify their closed gates. NVIDIA's play here is bluntly strategic: sell more GPUs by proving open models can run efficiently on them. The 1 million token context windowâ7x larger than its predecessorâaddresses a genuine pain point for multi-agent systems that must maintain coherent state across lengthy tool chains and conversation histories. Whether that translates to 7x real-world throughput, however, depends entirely on implementation details NVIDIA hasn't fully disclosed.
Benchmark theater versus deployment reality
Manual Codex image generationđˇ AI-generated / Tech&Space
The gap between "up to 7x throughput" in marketing slides and actual multi-agent deployment performance is where this story gets interesting. Synthetic benchmarks rarely capture the scheduling chaos of multiple agents calling tools, hitting memory bandwidth limits, and competing for the same GPU clusters. NVIDIA's claims about doubled accuracy are similarly unspecifiedâaccuracy at what task, measured against which baseline?
The MarkTechPost coverage notes the 5x figure without identifying comparison models, leaving readers to guess whether NVIDIA measured against its own prior generation, GPT-4-class systems, or something else entirely.
For developers, the practical signal is cleaner. A 120B open model with genuine long-context capabilities and MoE efficiency gives teams a plausible alternative to API-dependent workflowsâprovided they have the infrastructure to serve it. The real competitive pressure isn't on OpenAI or Anthropic directly; it's on other open-source efforts that must now match or exceed this throughput-accuracy-context combination without NVIDIA's vertical integration advantages.
Community adoption will hinge less on parameter counts than on whether the Mamba-Attention hybrid proves stable across diverse agent frameworks beyond NVIDIA's own NeMo tooling.
The broader context is an open-source ecosystem accelerating faster than many predicted two years ago. Models like this don't erase the proprietary lead, but they compress the distance in ways that matter for cost-sensitive, privacy-critical, or latency-constrained applications. If the 500B Ultra materializes as promised, the competitive framing shifts againâthough "expected 2026" leaves ample room for schedule slip or architectural pivot.

