Skip to main content

Qwen releases Qwen3

Created on April 29|Last edited on April 29
The Qwen Team has officially released Qwen3, the newest member of the Qwen language model series. With models such as the Qwen3-235B-A22B and the smaller MoE version Qwen3-30B-A3B, Qwen3 has positioned itself as a strong competitor against other advanced models like DeepSeek-R1 and Gemini-2.5-Pro. Remarkably, the compact Qwen3-4B even challenges the performance of much larger predecessors like Qwen2.5-72B-Instruct.

Open-Weight Models and Their Availability

Qwen3’s open-weight offerings include two Mixture of Experts (MoE) models and six dense models, all under the Apache 2.0 license. The MoE models, Qwen3-235B-A22B and Qwen3-30B-A3B, activate only a fraction of their total parameters, significantly reducing computational load. Alongside them, dense models ranging from 0.6 billion to 32 billion parameters are available across platforms like Hugging Face, ModelScope, and Kaggle. Deployment is supported by modern frameworks like SGLang and vLLM, while local usage is facilitated by Ollama, LMStudio, and llama.cpp.

Key Features

Qwen3 introduces powerful new features that enhance flexibility, efficiency, and accessibility. It is designed for a broader range of tasks with higher performance and more sophisticated reasoning capabilities. The open availability of models supports both research and practical applications.

Hybrid Thinking Modess

One of the standout features of Qwen3 is its hybrid thinking capability. In Thinking Mode, the model reasons step-by-step for complex tasks, whereas in Non-Thinking Mode, it responds quickly to simpler prompts. This dual-mode design allows users to manage the computational budget more effectively while maintaining high-quality outputs, depending on the complexity of the task at hand.

Multilingual Support

Qwen3 supports 119 languages and dialects, spanning Indo-European, Sino-Tibetan, Afro-Asiatic, Austronesian, Dravidian, Turkic, Tai-Kadai, Uralic, Austroasiatic families, and others. This robust multilingual foundation enables users from diverse linguistic backgrounds to access high-quality language processing capabilities, significantly widening the model’s applicability across the globe.

Improved Agentic Capabilities

The new generation of Qwen models has been particularly strengthened in terms of coding, reasoning, and agentic abilities. With enhanced support for MCP (Multi-Chain Prompting), Qwen3 can now handle complex interactions and better mimic autonomous behaviors in agent-based systems.

Pre-training Enhancements

The pre-training phase of Qwen3 saw a massive dataset expansion to 36 trillion tokens, nearly double that of Qwen2.5. The dataset not only covers a wider linguistic base but also includes large volumes of synthetic math and code data, improving the model’s capabilities in STEM fields. Qwen3’s training phases included a gradual expansion from 4K to 32K token context lengths, equipping the models to manage much longer text inputs efficiently.

Post-training Innovations

Post-training focused on developing hybrid thinking abilities through a structured four-stage process. This included cold start chain-of-thought fine-tuning, reasoning-focused reinforcement learning, the fusion of thinking and non-thinking modes, and general reinforcement learning across multiple domains. This meticulous approach allows Qwen3 to deliver both deep reasoning and quick responses, depending on user needs.

Develop with Qwen3

Developers can easily integrate Qwen3 into their workflows using platforms like Hugging Face, ModelScope, or Kaggle. Clear examples for loading models, preparing prompts, and managing thinking modes make it straightforward for both research and production deployments. Moreover, frameworks like SGLang and vLLM facilitate server deployments, while local options like Ollama provide flexibility for personal use.

Advanced Usages

Advanced features in Qwen3 include a soft-switch mechanism to dynamically control the thinking mode within conversations. By using simple tags like /think and /no_think, users can fine-tune how much reasoning the model applies at any point in a dialogue, providing greater interactivity and control over model behavior.

Agentic Usages

Qwen3’s agentic capabilities are best leveraged through Qwen-Agent, a toolkit that simplifies the integration of tool-calling abilities. Users can define tools via MCP configurations or use built-in ones like code interpreters. This modular approach makes it easier to build complex, multi-functional agents using Qwen3’s robust backend.

Friends of Qwen

The Qwen project acknowledges the critical role of its community and partners in driving its success. The open invitation for more contributors highlights a strong emphasis on collaboration and shared progress in the AI ecosystem.

Future Work

Looking ahead, the Qwen team envisions further scaling of model size, training data, and context length. They also plan to integrate environmental feedback to enhance long-horizon reasoning capabilities, pushing the boundary from mere models toward full-fledged intelligent agents. Qwen3 is seen as a step not just toward more capable models but toward the eventual development of AGI and ASI systems, signifying a shift from training models to training agents.


Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.