OpenKaito Text Embedding Models
Created on October 9|Last edited on October 9
Comment
Subnet 5 aims to develop the best-performing, most general-purpose text embedding model in the world. The model's performance will be evaluated against an infinitely large and dynamic dataset, serving as a proxy for an infinitely generalized benchmark, to ensure the highest possible level of domain generalization.
As a network, our model will continuously improve and adapt to the latest real-world knowledge. This dynamic evaluation will ensure that SN5’s embedding model not only surpasses existing state-of-the-art (SOTA) models, pushing the boundaries of industry performance, but also remains consistently competitive.
The dashboards below display the real-time InfoNCE loss and Top-1 Recall of the baseline model (OpenAI text-embedding-3-large) and SN5 miners' models.
Computing group metrics from first 32 groups
Run set
2658
The InfoNCE loss is a contrastive evaluation of text embeddings based on the pairwise relevance among texts:
This is to maximize the mutual information between positive pairs and :
and minimize the mutual information between negative pairs and : .
The Top-1 Recall measures the performance of document retrieval using the model embeddings.
💡
Add a comment