Mistral Unveils Multimodal Pixtral 12B
Created on September 12|Last edited on September 12
Comment
French AI startup Mistral has officially launched Pixtral 12B, its first multimodal large language model capable of processing both images and text. Announced at Mistral's AI Summit in San Francisco, Pixtral 12B marks a significant step forward for the company as it enters the competitive field of vision-language models.
Pixtral 12B: Key Features and Capabilities
Pixtral 12B boasts a 12-billion parameter architecture designed to tackle complex tasks involving both visual and textual information. The model supports images of arbitrary resolution and size, making it versatile for various applications, including optical character recognition (OCR) and detailed information extraction from images. One of its standout features is a 128,000-token context window, allowing it to handle longer and more complex interactions than many of its competitors.
The model, available under an Apache 2.0 license, offers an open-weight design that encourages community use and adaptation. Although Pixtral 12B is currently downloadable via torrent links and on platforms like Hugging Face, Mistral plans to integrate it into their hosted API services, Le Chat and Le Plateforme, in the near future.
Performance and Use Cases
According to Mistral, Pixtral 12B excels in understanding complex images, such as infographics and scientific diagrams, and can reason over visual content to produce textual responses. For example, it can generate code from sketches and accurately interpret data visualizations, making it valuable for professionals working in fields requiring multimodal reasoning.
Performance metrics presented at the event showed Pixtral 12B holding its own against leading models such as Anthropic’s Claude-3 and OpenAI’s GPT-4o, particularly in multimodal QA (Question Answering) tasks. However, there is some debate within the AI community about the accuracy of certain benchmark results, specifically those comparing Pixtral to Qwen2-VL, with questions raised about the validity of Qwen's scores.

Image from the announcement [1]

[2]
Overall
Mistral’s strategy of releasing open models aligns with its broader goal to establish itself as Europe’s answer to OpenAI. The company, which recently closed a $645 million funding round led by General Catalyst, has been valued at $6 billion and is positioning Pixtral 12B as a foundational technology in its suite of AI tools.
While Mistral is focused on providing free access to its models, it also plans to generate revenue by offering managed versions of these models alongside consulting services for corporate clients. This approach has positioned Mistral as a key player in the European AI market, attracting interest from various industries looking to integrate advanced multimodal AI solutions.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.