At the same time as Meta fends off questions and criticisms of its new Llama 4 mannequin household, graphics processing unit (GPU) grasp Nvidia has launched a brand new, totally open supply massive language mannequin (LLM) based mostly on Meta’s older mannequin Llama-3.1-405B-Instruct mannequin and it’s claiming close to prime efficiency on a wide range of third-party benchmarks — outperforming the vaunted rival DeepSeek R1 open supply reasoning mannequin.
Llama-3.1-Nemotron-Extremely-253B-v1, is a dense 253-billion parameter designed to assist superior reasoning, instruction following, and AI assistant workflows. It was first talked about again at Nvidia’s annual GPU Expertise Convention (GTC) in March.
The discharge displays Nvidia continued concentrate on efficiency optimization by architectural innovation and focused post-training.
Introduced final evening, April 7, 2025, the mannequin code is now publicly accessible on Hugging Face, with open weights and post-training knowledge. It’s designed to function effectively in each “reasoning on” and “reasoning off” modes, permitting builders to toggle between high-complexity reasoning duties and extra simple outputs based mostly on system prompts.
Designed for environment friendly inference
The Llama-3.1-Nemotron-Extremely-253B builds on Nvidia’s earlier work in inference-optimized LLM growth. Its structure—custom-made by a Neural Structure Search (NAS) course of—introduces structural variations similar to skipped consideration layers, fused feedforward networks (FFNs), and variable FFN compression ratios.
This architectural overhaul reduces reminiscence footprint and computational calls for with out severely impacting output high quality, enabling deployment on a single 8x H100 GPU node.
The end result, based on Nvidia, is a mannequin that gives robust efficiency whereas being more cost effective to deploy in knowledge heart environments. Extra {hardware} compatibility contains assist for Nvidia’s B100 and Hopper microarchitectures, with configurations validated in each BF16 and FP8 precision modes.
Submit-training for reasoning and alignment
Nvidia enhanced the bottom mannequin by a multi-phase post-training pipeline. This included supervised fine-tuning throughout domains similar to math, code technology, chat, and power use, adopted by reinforcement studying with Group Relative Coverage Optimization (GRPO) to additional increase instruction-following and reasoning efficiency.
The mannequin underwent a information distillation section over 65 billion tokens, adopted by continuous pretraining on an extra 88 billion tokens.
Coaching datasets included sources like FineWeb, Buzz-V1.2, and Dolma. Submit-training prompts and responses had been drawn from a mix of public corpora and artificial technology strategies, together with datasets that taught the mannequin to distinguish between its reasoning modes.
Improved efficiency throughout quite a few domains and benchmarks
Analysis outcomes present notable beneficial properties when the mannequin operates in reasoning-enabled mode. For example, on the MATH500 benchmark, efficiency elevated from 80.40% in commonplace mode to 97.00% with reasoning enabled.
Equally, outcomes on the AIME25 benchmark rose from 16.67% to 72.50%, and LiveCodeBench scores greater than doubled, leaping from 29.03% to 66.31%.
Efficiency beneficial properties had been additionally noticed in tool-based duties like BFCL V2 and performance composition, in addition to usually query answering (GPQA), the place the mannequin scored 76.01% in reasoning mode versus 56.60% with out.
These benchmarks had been carried out with a most sequence size of 32,000 tokens, and every check was repeated as much as 16 instances to make sure accuracy.
In comparison with DeepSeek R1, a state-of-the-art MoE mannequin with 671 billion whole parameters, Llama-3.1-Nemotron-Extremely-253B exhibits aggressive outcomes regardless of having lower than half the variety of parameters (mannequin settings) — outperforming in duties like GPQA (76.01 vs. 71.5), IFEval instruction following (89.45 vs. 83.3), and LiveCodeBench coding duties (66.31 vs. 65.9).
In the meantime, DeepSeek R1 holds a transparent benefit on sure math evaluations, notably AIME25 (79.8 vs. 72.50), and barely edges out MATH500 (97.3 vs. 97.00).
These outcomes counsel that regardless of being a dense mannequin, Nvidia’s providing matches or exceeds MoE alternate options on reasoning and basic instruction alignment duties, whereas trailing barely in math-heavy classes.
Utilization and integration
The mannequin is suitable with the Hugging Face Transformers library (model 4.48.3 beneficial) and helps enter and output sequences as much as 128,000 tokens.
Builders can management reasoning conduct through system prompts and choose decoding methods based mostly on job necessities.
For reasoning duties, Nvidia recommends utilizing temperature sampling (0.6) with a top-p worth of 0.95. For deterministic outputs, grasping decoding is most popular.
Llama-3.1-Nemotron-Extremely-253B helps multilingual functions, with capabilities in English and several other further languages, together with German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
It is usually appropriate for widespread LLM use circumstances similar to chatbot growth, AI agent workflows, retrieval-augmented technology (RAG), and code technology.
Licensed for business use
Launched underneath the Nvidia Open Mannequin License and ruled by the Llama 3.1 Group License Settlement, the mannequin is prepared for business use.
Nvidia has emphasised the significance of accountable AI growth, encouraging groups to guage the mannequin’s alignment, security, and bias profiles for his or her particular use circumstances.
Oleksii Kuchaiev, Director of AI Mannequin Submit-Coaching at Nvidia, shared the announcement on X, stating that the staff was excited to share the open launch, describing it as a dense 253B mannequin designed with toggle ON/OFF reasoning capabilities and launched with open weights and knowledge.
Day by day insights on enterprise use circumstances with VB Day by day
If you wish to impress your boss, VB Day by day has you coated. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for optimum ROI.
An error occured.