Mistral AI Unveils Codestral Mamba and Mathstral
Mistral unveils new models!
Created on July 16|Last edited on July 16
Comment
Mistral AI has unveiled two significant language models, Codestral Mamba and Mathstral, each targeting distinct areas of artificial intelligence: code generation and mathematical reasoning. Both models are available under the Apache 2.0 license, reflecting Mistral AI's commitment to advancing AI research and making cutting-edge tools accessible to the broader community.
Codestral Mamba
Codestral Mamba, named in homage to Cleopatra, is designed specifically for code generation. This model was developed with significant contributions from Albert Gu and Tri Dao. It represents a notable departure from traditional Transformer models, which use attention mechanisms that scale quadratically with the length of the input sequence. This quadratic scaling often leads to inefficiencies and high resource demands for long sequences. In contrast, Mamba models employ linear time inference, which allows for more efficient processing and the theoretical ability to model sequences of infinite length. This means that Codestral Mamba can provide rapid responses regardless of the input size, making it particularly effective for code productivity tasks.

This is not the first large Mamba model; however, achieving similar performance in practice has been challenging. Codestral Mamba has been rigorously tested, demonstrating strong in-context retrieval capabilities with benchmarks handling up to 256,000 tokens. This makes it a highly effective local code assistant. Users can deploy Codestral Mamba through the mistral-inference SDK or TensorRT-LLM, with future support anticipated for local inference via llama.cpp. The model’s raw weights are available for download from HuggingFace, and it can be tested on "la Plateforme" under the name codestral-mamba-2407. While Codestral Mamba is freely available under the Apache 2.0 license, its more powerful counterpart, Codestral 22B, is available under commercial and community licenses. Codestral Mamba boasts an impressive 7,285,403,648 parameters, underscoring its capability for sophisticated code generation tasks.
Mathstral
Mathstral, named in tribute to Archimedes, marks another significant release from Mistral AI. This model, designed for math reasoning and scientific discovery, is based on the original Mistral 7B model and features a 32k context window. The release of Mathstral aligns with Mistral AI's broader effort to support academic and scientific research, particularly through their collaboration with Project Numina.
Mathstral excels in handling advanced mathematical problems that require complex, multi-step logical reasoning. It has achieved state-of-the-art performance in its size category, with benchmark scores of 56.6% on MATH and 63.47% on MMLU. When evaluated across various STEM subjects, Mathstral demonstrated superior performance compared to Mistral 7B, showcasing its specialized capabilities. The model's performance can be further enhanced with more inference-time computation, achieving scores of 68.37% on MATH with majority voting and 74.59% with a strong reward model among 64 candidates.

Usage
Users can deploy Mathstral using the mistral-inference SDK and fine-tune it with mistral-finetune. The model's weights are available on HuggingFace, facilitating easy access for researchers and developers. Mathstral was evaluated using GRE Math Subject Test problems curated by Professor Paul Bourdon, further highlighting its robustness and reliability for academic applications.
Both Codestral Mamba and Mathstral exemplify Mistral AI's dedication to creating high-performance, specialized AI models. These releases provide powerful tools for developers and researchers, enabling advancements in both code generation and mathematical reasoning.
Announcements:
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.