Skip to main content

Mistral AI Launches Ministral 3B and 8B Models for Edge Computing

New models from Mistral!
Created on October 17|Last edited on October 17
A year after the groundbreaking release of the Mistral 7B, Mistral AI has introduced two new advanced models optimized for edge computing and on-device scenarios: Ministral 3B and Ministral 8B. These models, collectively known as "les Ministraux," represent a major step in making AI more efficient and accessible for real-time, low-latency applications that prioritize privacy and computational efficiency.

Next-Generation Performance and Context Length

The Ministraux models are designed to push the limits of knowledge and reasoning within the compact sub-10 billion parameter range. Ministral 8B introduces an interleaved sliding-window attention pattern to enhance inference speed while conserving memory. Both models support up to 128k tokens of context, though current availability is capped at 32k using the vLLM framework. This expansive context capacity allows the models to handle complex multi-step tasks more effectively and align them with user needs in extended workflows.

Use Cases and Flexibility

Mistral AI developed les Ministraux with a focus on localized, private AI operations. They are ideal for scenarios such as internet-free smart assistants, on-device language translation, autonomous robotics, and local data analytics. The models are tailored for users ranging from independent developers to large-scale enterprises looking to integrate AI tools that can operate without requiring constant cloud connectivity. Additionally, these models can serve as intermediate agents in larger workflows by parsing inputs, routing tasks, and executing function calls with minimal latency and cost.

Performance and Benchmarks

The company shared benchmark comparisons demonstrating Ministral 3B and 8B outperforming rival models such as Llama 3.1, Llama 3.2, and Gemma 2. Both the base models and the specialized "Instruct" versions, tuned for specific applications, exhibit superior results in tasks across multiple evaluation categories. Instruct models like Ministral 3B have even surpassed the larger Mistral 7B in key areas, highlighting the advancement of smaller, more focused architectures.



Pricing and Availability

Ministral 8B is available through API access at $0.10 per million tokens (input and output), while Ministral 3B costs $0.04 per million tokens. Both models are offered under the Mistral Commercial License and the Mistral Research License, allowing flexibility for users across industries. For those interested in deploying the models independently, Mistral AI offers support for lossless quantization to ensure optimized performance. Model weights for Ministral 8B Instruct are available for research purposes, with availability on cloud platforms to follow soon.