Xiaomi unviels MiMo-7B

MiMo-7B is Xiaomi's open-source 7B-parameter language model series designed for advanced reasoning in math and code, outperforming larger models through optimized pretraining and reinforcement learning techniques.
Brett Young
Created on April 30|Last edited on April 30
Comment
Xiaomi has introduced MiMo-7B, a new open-source language model series specifically built for reasoning-intensive tasks such as mathematics and code. MiMo-7B directly challenges the prevailing view that only large-scale models can handle complex reasoning. While many open-source RL approaches rely on 32B models to achieve high performance, Xiaomi’s team demonstrates that with the right pretraining and posttraining strategy, a 7B parameter model can achieve and even surpass larger counterparts in targeted reasoning tasks.
Pre-training Strategies and Data InnovationMiMo-7B was trained from scratch with an emphasis on maximizing the reasoning signal during pretraining. The team processed 25 trillion tokens, using an optimized pipeline that included improved text extraction tools and multi-dimensional data filtering. A significant part of the pretraining involved generating high-diversity synthetic reasoning data and adopting a three-stage data mixture strategy. Xiaomi also used multiple-token prediction (MTP) as a secondary objective to speed up inference and improve reasoning accuracy. These foundational steps positioned MiMo-7B-Base as a strong base model, particularly adept in abstract and symbolic tasks.
Reinforcement Learning for ReasoningXiaomi further refined MiMo-7B with reinforcement learning, producing two RL-enhanced variants: MiMo-7B-RL-Zero (trained from the base model) and MiMo-7B-RL (trained from a supervised fine-tuned version). The RL data included 130,000 high-quality code and math problems, with accuracy rewards guided by rule-based verification. To address sparse reward issues, especially in code tasks, Xiaomi introduced a difficulty-driven reward system for test cases. This resulted in more stable training and improved performance, especially on benchmarks such as MATH and LiveCodeBench.
RL Infrastructure and Inference OptimizationA major contributor to MiMo-7B's RL performance is its dedicated rollout engine designed to minimize GPU idle time. The system uses asynchronous reward computation and early termination for inefficient samples, achieving over 2x speedup in training and nearly doubling validation throughput. Xiaomi's custom vLLM fork supports MTP and enhances inference robustness, further aligning model architecture with real-world deployment needs.
Evaluation and BenchmarkingMiMo-7B-RL achieves remarkable results, particularly in mathematics and coding tasks. On MATH-500, it scores 95.8 percent pass@1, outperforming models like Qwen-14B and even OpenAI o1-mini. For competitive benchmarks like AIME 2024 and LiveCodeBench v6, MiMo-7B-RL either matches or surpasses much larger models, underscoring the efficacy of its training recipe. In general benchmarks like GPQA and DROP, the model holds up well but still lags behind the very top-tier proprietary systems such as GPT-4 and Claude 3.5 in language-heavy tasks.
﻿
Deployment and UsageMiMo-7B is optimized for deployment with Xiaomi’s fork of vLLM, which includes full support for MTP and fast inference features. The repo provides scripts for integrating MiMo with both vLLM and HuggingFace Transformers. Xiaomi recommends using an empty system prompt for best results and welcomes community contributions to extend inference support to other frameworks.
Conclusion and Community ImpactMiMo-7B represents a significant advancement in efficient model design focused on reasoning. By targeting both pretraining data composition and reinforcement learning structure, Xiaomi shows that smaller models can rival large models in specialized tasks. The open-source release includes all variants, from base to RL-trained, providing a valuable foundation for future research in reasoning-optimized language models.
﻿
﻿
﻿
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.