Skip to main content

Alibaba unveils Qwen2.5: 18 trillion tokens and counting

A new set of Qwen models trained on a massive dataset!
Created on September 20|Last edited on September 20
The Qwen team is back with another major release: Qwen2.5. Just three months after the release of Qwen2, developers have built new models based on its architecture, offering insights and feedback that have shaped this new iteration. Qwen2.5 is being launched alongside specialized models for coding and mathematics, expanding the horizons of what's possible with open-source large language models. These models represent what the team describes as one of the largest open-source releases in history, available in various sizes ranging from 0.5 billion to 72 billion parameters. Alongside the language models, Qwen2.5-Coder and Qwen2.5-Math offer more tailored applications for programming and mathematics, respectively.

Takeaways from Qwen2.5

Qwen2.5 is pre-trained on an extensive dataset of 18 trillion tokens, significantly improving the model's general knowledge, with gains across several areas such as instruction following, long-text generation, coding, and structured data handling. It supports up to 128K tokens and can generate up to 8K tokens, making it a highly flexible tool for diverse tasks. Additionally, Qwen2.5 is multilingual, offering support for 29 languages, which broadens its appeal for global applications. Qwen2.5-Coder and Qwen2.5-Math further refine these capabilities for specific tasks, with coding-focused models trained on 5.5 trillion tokens of code data and math models bolstered by advanced reasoning techniques like Chain-of-Thought (CoT) and Program-of-Thought (PoT).

Qwen2.5 Performance

Qwen2.5-72B, the flagship model of this release, has been benchmarked against top-tier open-source models like Llama-3.1-70B and Mistral-Large-V2. The results show that Qwen2.5 stands out, achieving high performance across multiple benchmarks. Even its base model, Qwen2.5-72B, competes well against larger models such as Llama-3-405B. The API-based model, Qwen-Plus, continues this trend by showing competitive results against other proprietary models like GPT4-o and Claude-3.5-Sonnet.
Qwen-Plus, which is part of the Qwen family, demonstrates areas for improvement, particularly when compared to some of the top models, but still performs admirably, especially in terms of cost-effectiveness. The reintroduction of Qwen2.5-14B and Qwen2.5-32B offers a middle ground between size and performance, with both models outperforming some larger models on diverse tasks.

A Shift Toward Smaller Models

Qwen2.5-3B exemplifies a new trend in the development of smaller language models that deliver highly competitive results despite their reduced parameter sizes. These small language models (SLMs) are closing the gap with their larger counterparts, demonstrating that bigger isn't always better. As seen in benchmarks, Qwen2.5-3B is capable of achieving impressive results, showing that the smaller models in the Qwen family are still highly capable.

Qwen2.5-Coder

For coding, Qwen2.5-Coder emerges as a high-performing assistant, capable of debugging, answering coding-related questions, and offering code suggestions. Despite its smaller parameter size, Qwen2.5-Coder surpasses larger models in various programming tasks. The model has been trained on a significant amount of code data and has shown its capability across a range of programming languages.

Qwen2.5-Math

In mathematics, Qwen2.5-Math has made substantial improvements over its predecessor, Qwen2-Math. This model now supports both Chinese and English and incorporates advanced reasoning techniques. It can tackle complex mathematical problems and reasoning tasks, and even the smaller models like Qwen2.5-Math-1.5B are competitive against larger models.

Developing with Qwen2.5

For developers, working with Qwen2.5 is straightforward, thanks to its integration with Hugging Face’s transformers library. Users can easily deploy models via API or run them locally with frameworks like vLLM. Additionally, the model supports tool calling, which enhances its interactivity and application in real-world tasks.

What’s Next?

Qwen2.5 is just the beginning. The team is committed to integrating different modalities—language, vision, and audio—into a unified model, enhancing reasoning capabilities, and scaling inference compute. Reinforcement learning techniques, such as o1, inspire future directions for improving model performance, and the team plans to keep pushing the boundaries of what's possible in AI.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.