Voyage-Multimodal-3: A new Multimodal Embedding Model
Created on November 18|Last edited on November 18
Comment
Voyage AI has unveiled voyage-multimodal-3, a transformative model setting new benchmarks for multimodal embeddings. Designed for tasks involving interleaved text and images, this model excels in vectorizing diverse content such as screenshots, PDFs, tables, and slides, bridging the gap between visual and textual data for enhanced retrieval and semantic search.
A Unified Approach to Multimodal Data
Unlike traditional models like OpenAI’s CLIP or Cohere’s multimodal embeddings, which rely on separate networks for text and image processing, voyage-multimodal-3 uses a single transformer encoder. This architecture ensures that textual and visual elements are processed cohesively, preserving contextual relationships and delivering more accurate embeddings for mixed-modality data.
Superior Retrieval Performance
Evaluations reveal that voyage-multimodal-3 outperforms competitors by substantial margins across multiple tasks. In table and figure retrieval, it shows a 41.44% improvement over OpenAI CLIP large. For document screenshot retrieval, it achieves a 26.54% lead, and in text-to-photo matching, it surpasses other models by up to 6.55%. These advancements make it the top choice for integrating complex datasets with diverse formats.

Eliminating the Modality Gap
The model addresses a long-standing issue in multimodal embeddings: the modality gap. Previous models struggled with mixed data, often skewing retrieval results toward one modality. Voyage-multimodal-3 resolves this by processing all inputs through a shared encoder, ensuring balanced and accurate vector representations. This eliminates the need for separate document parsing or complex pipelines.
Applications and Accessibility
Voyage-multimodal-3 is poised to transform workflows in industries relying on content-rich documents. Researchers, analysts, and developers can now vectorize entire knowledge bases containing diverse formats without additional preprocessing. The model’s robustness across datasets makes it ideal for semantic search, document analysis, and more. Available for free up to 200 million tokens, Voyage AI provides easy access to this cutting-edge tool through sample notebooks and comprehensive documentation.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.