Embeddings - Retrieval
Collection of learnings about embedding models, emphasizing retrieval tasks.
Created on September 24|Last edited on October 4
Comment
Metrics
Plots of the primary metrics for models averaged across retrieval tasks. Black bars are supposed to show std dev, but seem to show max and min across all tasks.
Run set
80
In these benchmarks, we run the following MTEB tasks:
- ClimateFEVER
- CQADupstackAndroidRetrieval
- CQADupstackEnglishRetrieval
- CQADupstackGamingRetrieval
- CQADupstackGisRetrieval
- CQADupstackMathematicaRetrieval
- CQADupstackPhysicsRetrieval
- CQADupstackProgrammersRetrieval
- CQADupstackStatsRetrieval
- CQADupstackTexRetrieval
- CQADupstackUnixRetrieval
- CQADupstackWebmasters- ClimateFEVER
- CQADupstackAndroidRetrieval
- CQADupstackEnglishRetrieval
- CQADupstackGamingRetrieval
- CQADupstackGisRetrieval
- CQADupstackMathematicaRetrieval
- CQADupstackPhysicsRetrieval
- CQADupstackProgrammersRetrieval
- CQADupstackStatsRetrieval
- CQADupstackTexRetrieval
- CQADupstackUnixRetrieval
- CQADupstackWebmastersRetrieval
- CQADupstackWordpressRetrieval
- DBPedia
- FEVER
- FiQA2018
- HotpotQA
- MSMARCO
- NFCorpus
- NQ
- QuoraRetrieval
- SCIDOCS
- SciFact
- Touche2020
- TRECCOVID
Note that the MTEB leaderboard only uses a subset of these tasks for benchmarking retrieval, however.
The instructor-xl and e5-large-v2 models took at least 3 days to run mteb retrieval on 1 GPU. Cancelled them.
Add a comment