Skip to main content

Pixtral Grows Up

Mistral's Multimodal LLM gets an upgrade!
Created on November 18|Last edited on November 18
Mistral AI has announced the release of Pixtral Large, a groundbreaking multimodal model with 124 billion parameters. This model builds on the foundation of Mistral Large 2 and integrates advanced image and document understanding while preserving state-of-the-art text capabilities. Alongside this announcement, Mistral Large also receives a notable update, expanding its utility for enterprise and research applications.

Pixtral Large

Pixtral Large represents a leap in multimodal AI capabilities, featuring a 123-billion parameter decoder and a dedicated 1-billion parameter vision encoder. It is designed for seamless understanding of documents, charts, and natural images. The 128K context window allows it to process up to 30 high-resolution images simultaneously, making it an essential tool for tasks requiring large-scale visual analysis.

Benchmarks and Performance

Pixtral Large demonstrates industry-leading performance across a range of benchmarks, significantly outperforming notable models such as GPT-4o, Claude-3.5 Sonnet, and Gemini-1.5 Pro. On MathVista, it achieves 69.4%, surpassing GPT-4o's 65.4% and Gemini-1.5 Pro's 67.8%. It leads in ChartQA with 88.1%, outperforming Claude-3.5 Sonnet at 89.1% and Gemini-1.5 Pro at 83.8%. For document understanding on DocVQA, Pixtral Large reaches 93.3%, exceeding GPT-4o's 88.5% and Claude-3.5 Sonnet's 88.6%. Additionally, in the VQAv2 benchmark, it scores 80.9%, ahead of Gemini-1.5 Pro's 70.6% and GPT-4o's 76.4%. In AI2D tasks, it achieves 93.8%, closely matching or surpassing Claude-3.5 Sonnet's 94.6%. Finally, on MM MT-Bench, a real-world multimodal evaluation, Pixtral Large secures a score of 7.4, leading over Gemini-1.5 Pro at 6.8 and GPT-4o at 6.7, solidifying its position as the top-performing multimodal model in open weights.

Applications and Licenses

The model is released under two licenses: the Mistral Research License for educational and research use, and the Mistral Commercial License for commercial testing and production. Users can try Pixtral Large via the Mistral API or download it for deeper integration.

Qualitative Insights

Demonstrations of Pixtral Large include its ability to perform multilingual OCR and detailed reasoning over complex data. For instance, it can calculate itemized bills, analyze training loss curves for AI models, and extract meaningful insights from website screenshots. These examples showcase its versatility in understanding both structured and unstructured data.

Updates to Mistral Large

Mistral Large, the cornerstone text model of Mistral AI, now comes in an updated version: Mistral Large 24.11. This iteration introduces improvements in long-context understanding, enhanced function calling, and a refined system prompt. These features make it particularly effective for enterprise use cases, including knowledge exploration, document understanding, and customer experience automation. It will soon be accessible through major cloud providers like Google Cloud and Microsoft Azure.

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.