Skip to main content

HuggingFace's new LLM: SmolLM3

Created on July 9|Last edited on July 9
SmolLM3, released by Hugging Face, is a fully open 3 billion parameter language model built to be highly efficient and competitive with larger models. It challenges the current performance boundaries for small-scale models, outperforming comparable models like Llama-3.2-3B and Qwen2.5-3B and holding its own against more powerful 4B models such as Qwen3 and Gemma3. This release represents a key moment in the development of compact, transparent, and powerful language models designed for both research and practical deployment.

Efficient Architecture and Long Context Support

Built upon the Llama architecture, SmolLM3 includes several modifications that improve efficiency and enable extended context handling. It replaces traditional multi-head attention with Grouped Query Attention to reduce memory usage during inference. Long-context capabilities are enhanced through selective removal of rotary embeddings via NoPE and the use of YaRN for extrapolated context handling up to 128k tokens. Additionally, intra-document masking helps the model differentiate between document boundaries during training, while training stability is improved by removing weight decay in embeddings, a technique borrowed from OLMo 2.

Multilingual and Reasoning Flexibility

The model supports six major European languages: English, French, Spanish, German, Italian, and Portugues, making it suitable for a range of multilingual applications. A standout feature is the dual-mode reasoning system, where users can toggle between explicit reasoning and more concise responses using simple commands. This dual-mode design, embedded in the chat template, enhances user control over how the model processes and presents information, providing flexibility for tasks that vary in cognitive complexity.

Training and Data Strategy

SmolLM3 was trained on a massive 11.2 trillion token corpus over a three-stage curriculum. The early phase emphasized general web data, with later phases gradually increasing the proportion of code and math content. Hugging Face incorporated instruction and reasoning datasets, such as OpenMathReasoning and MegaMath, to boost the model’s performance in complex domains. Additional mid-training steps extended its context length and bolstered reasoning capabilities with focused datasets, including NVIDIA’s Llama-Nemotron series and OpenThoughts3.

Instruct Fine-Tuning and Alignment

The final instruct version of SmolLM3 was developed through supervised fine-tuning and off-policy preference optimization. A 1.8B token instruction dataset was split between reasoning and non-reasoning content, with synthetic data from Qwen3-32B used to fill reasoning gaps. Hugging Face employed Anchored Preference Optimization (APO), a more stable alternative to traditional DPO methods, for aligning the model to human-like preferences. To maintain performance in long-context scenarios, the team merged APO-aligned weights with an earlier checkpoint using MergeKit, ensuring that instruction tuning did not degrade long-range reasoning ability.

Performance Across Benchmarks

SmolLM3 achieves best-in-class results among 3B models and remains competitive with 4B models across a range of benchmarks, including those testing knowledge, code, reasoning, and multilingual capabilities. In non-reasoning mode, it surpasses models like Llama3.2-3B Instruct and Qwen2.5-3B Instruct while requiring less compute. In reasoning mode, SmolLM3 demonstrates strong gains on high-difficulty evaluations such as AIME 2025 and GPQA Diamond. Though some 4B models achieve higher raw scores, SmolLM3’s efficiency makes it a compelling option in resource-constrained environments.

Deployment and Usage

Users can run SmolLM3 with vLLM and transformers version 4.53.0. The system prompt interface allows for straightforward switching between thinking modes using /think and /no_think. Tool-calling support is also built into the instruct template, making SmolLM3 compatible with more advanced interaction flows without external modifications. The availability of the full training recipe, data mixture, and engineering decisions makes it especially suitable for researchers and developers looking to replicate or build upon Hugging Face’s work.

Conclusion

SmolLM3 is more than just a compact language model, it’s a demonstration of what can be achieved when performance, transparency, and practical usability are treated as core design goals. Its release sets a new benchmark for 3B-scale models, offering an open-source alternative that rivals models significantly larger in size. Hugging Face’s decision to openly share every part of the development pipeline is likely to accelerate innovation in the open LLM ecosystem, especially for developers and institutions focused on high-quality, efficient, and transparent AI.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.